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CHAPTER 1 
Ee 


What Is Deep Learning? 


1.1 Deep Learning 


Deep learning is a subset of machine learning which is itself a subset of artificial intelligence 
and statistics. Artificial intelligence research began shortly after World War II [24]. Early work 
was based on the knowledge of the structure of the brain, propositional logic, and Turing’s 
theory of computation. Warren McCulloch and Walter Pitts created a mathematical formulation 
for neural networks based on threshold logic. This allowed neural network research to split 
into two approaches: one centered on biological processes in the brain and the other on the 
application of neural networks to artificial intelligence. It was demonstrated that any function 
could be implemented through a set of such neurons and that a neural net could learn. In 
1948, Norbert Wiener’s book, Cybernetics, was published which described concepts in control, 
communications, and statistical signal processing. The next major step in neural networks was 
Donald Hebb’s book in 1949, The Organization of Behavior, connecting connectivity with 
learning in the brain. His book became a source of learning and adaptive systems. Marvin 
Minsky and Dean Edmonds built the first neural computer at Harvard in 1950. 

The first computer programs, and the vast majority now, have knowledge built into the 
code by the programmer. The programmer may make use of vast databases. For example, a 
model of an aircraft may use multidimensional tables of aerodynamic coefficients. The result- 
ing software therefore knows a lot about aircraft, and running simulations of the models may 
present surprises to the programmer and the users. Nonetheless, the programmatic relationships 
between data and algorithms are predetermined by the code. 

In machine learning, the relationships between the data are formed by the learning system. 
Data is input along with the results related to the data. This is the system training. The machine 
learning system relates the data to the results and comes up with rules that become part of the 
system. When new data is introduced, it can come up with new results that were not part of the 
training set. 

Deep learning refers to neural networks with more than one layer of neurons. The name 
“deep learning” implies something more profound, and in the popular literature, it is taken 
to imply that the learning system is a ''deep thinker.” Figure 1.1 shows a single-layer and 
multilayer network. It turns out that multilayer networks can learn things that single-layer 
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Figure 1.1: Two neural networks. The one on the right is a deep learning network. 
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networks cannot. The elements of a network are nodes, where signals are combined, weights 
and biases. Biases are added at nodes. In a single layer, the inputs are multiplied by weights, 
then added together at the end, after passing through a threshold function. In a multilayer or 
deep learning network, the inputs are combined in the second layer before being output. There 
are more weights, and the added connections allow the network to learn and solve more complex 
problems. 

There are many types of machine learning. Any computer algorithm that can adapt based 
on inputs from the environment is a learning system. Here is a partial list: 


1. Neural nets (deep learning or otherwise) 
. Support vector machines 


. Adaptive control 


. Parameter identification (may be the same as the previous one) 


2 
3 
4. System identification 
5 
6. Adaptive expert systems 
7 


. Control algorithms (a proportional integral derivative control stores information about 
constant inputs in its integrator) 


Some systems use a predefined algorithm and learn by fitting parameters of the algorithm. 
Others create a model entirely from data. Deep learning systems are usually in the latter cate- 


gory. 
We'll give a brief history of deep learning and then move on to two examples. 


1.2 History of Deep Learning 


Minsky wrote the book Perceptrons with Seymour Papert in 1969, which was an early analysis 
of artificial neural networks. The book contributed to the movement toward symbolic process- 
ing in AI. The book noted that single neurons could not implement some logical functions such 
as exclusive-or (XOR) and erroneously implied that multilayer networks would have the same 
issue. It was later found that three-layer networks could implement such functions. We give 
the XOR solution in this book. 
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Multilayer neural networks were discovered in the 1960s but not really studied until the 
1980s. In the 1970s, self-organizing maps using competitive learning were introduced [14]. A 
resurgence in neural networks happened in the 1980's. Knowledge-based, or ““expert,”” systems 
were also introduced in the 1980s. From Jackson [16], 


An expert system is a computer program that represents and reasons with knowl- 
edge of some specialized subject with a view to solving problems or giving advice. 


—Peter Jackson, Introduction to Expert Systems 


Back propagation for neural networks, a learning method using gradient descent, was rein- 
vented in the 1980s, leading to renewed progress in this field. Studies began both of human 
neural networks (i.e., the human brain) and the creation of algorithms for effective compu- 
tational neural networks. This eventually led to deep learning networks in machine learning 
applications. 

Advances were made in the 1980s as AI researchers began to apply rigorous mathematical 
and statistical analysis to develop algorithms. Hidden Markov Models were applied to speech. 
A Hidden Markov Model is a model with unobserved (i.e., hidden) states. Combined with 
massive databases, they have resulted in vastly more robust speech recognition. Machine trans- 
lation has also improved. Data mining, the first form of machine learning as it is known today, 
was developed. 

In the early 1990s, Vladimir Vapnik and coworkers invented a computationally power- 
ful class of supervised learning networks known as Support Vector Machines (SVM). These 
networks could solve problems of pattern recognition, regression, and other machine learning 
problems. 

There has been an explosion in deep learning in the past few years. New tools have been 
developed that make deep learning easier to implement. TensorFlow is available from Amazon 
AWS. It makes it easy to deploy deep learning on the cloud. It includes powerful visualization 
tools. TensorFlow allows you to deploy deep learning on machines that are only intermittently 
connected to the Web. IBM Watson is another. It allows you to use TensorFlow, Keras, Py- 
Torch, Caffe, and other frameworks. Keras is a popular deep learning framework that can be 
used in Python. All of these frameworks have allowed deep learning to be deployed just about 
everywhere. 

In this book, we will present MATLAB-based deep learning tools. These powerful tools let 
you create deep learning systems to solve many different problems. In our book, we will apply 
MATLAB deep learning to a wide range of problems ranging from nuclear fusion to classical 
ballet. 

Before getting into our examples, we will give some fundamentals on neural nets. We will 
first give backgrounds on neurons and how an artificial neuron represents a real neuron. We 
will then design a daylight detector. We will follow this with the famous XOR problem that 
stopped neural net development for some time. Finally, we will discuss the examples in this 
book. 
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1.3 Neural Nets 


Neural networks, or neural nets, are a popular way of implementing machine *'intelligence." 
The idea is that they behave like the neurons in a brain. In this section, we will explore how 
neural nets work, starting with the most fundamental idea with a single neuron and working our 
way up to a multilayer neural net. Our example for this will be a pendulum. We will show how 
a neural net can be used to solve the prediction problem. This is one of the two uses of a neural 
net, prediction and classification. We'll start with a simple classification example. 

Let's first look at a single neuron with two inputs. This is shown in Figure 1.2. This neuron 
has inputs xı and x2, a bias b, weights w; and we, and a single output z. The activation function 
o takes the weighted input and produces the output. In this diagram, we explicitly add icons for 
the multiplication and addition steps within the neuron, but in typical neural net diagrams such 
as Figure 1.1, they are omitted. 


z = o(y) = 0(w¡x1 + woz» + b) (1.1) 


Let's compare this with a real neuron as shown in Figure 1.3. A real neuron has multiple 
inputs via the dendrites. Some of these branch which means that multiple inputs can connect 
to the cell body through the same dendrite. The output is via the axon. Each neuron has one 
output. The axon connects to a dendrite through the synapse. Signals pass from the axon to the 
dendrite via a synapse. 

There are numerous commonly used activation functions. We show three: 


a(y) = tanh(y) (1.2) 
2 
oly) = y (1.4) 


The exponential one is normalized and offset from zero so it ranges from -1 to 1. The last 
one, which simply passes through the value of y, is called the linear activation function. The 


Figure 1.2: A two-input neuron. 


Neuron 
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Figure 1.3: A neuron connected to a second neuron. A real neuron can have 10,000 inputs! 


Neuron 1 Neuron 2 


Synapse 


| j 


E ui 


Figure 1.4: The three activation functions from OneNeuron 


" Activation Functions 
T T T 
Tanh 
3r — Exp | 
Linear 


Output 


Input 


following code in the script OneNeuron.mcomputes and plots these three activation functions 
for an input q. Figure 1.4 shows the three activation functions on one plot. 
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OneNeuron.m 


Single neuron demonstration. 


1 

2 Look at the activation functions 

& oy - linspace(-4,4); 

qq - tanh(y); 

572 = 2./(1+exp(-y)) - 1; 

6 

7 ios ll pra, tes alli? y Ate?) aye Las 

8 JOULDUCL Ed guest teu ActuivatdJoneumctdyornsdapdlotst 1t lcu 
Activation munet ions aman 

9 'plot set',{[1 2 3]},’ legend’, {{' Tanh’ ,'Exp','Linear'}}); 


Activation functions that saturate, or reach a value of input after which the output is constant 
or changes very slowly, model a biological neuron that has a maximum firing rate. These 
particular functions also have good numerical properties that are helpful in learning. 

Let's look at a single input neural net shown in Figure 1.5. This neuron is 


z = o(2x +3) (1.5) 


where the weight w on the single input x is 2 and the bias b is 3. If the activation function is 
linear, the neuron is just a linear function of x, 


z=y=2r+3 (1.6) 


Neural nets do make use of linear activation functions, often in the output layer. It is the 
nonlinear activation functions that give neural nets their unique capabilities. 

Let's look at the output with the preceding activation functions plus the threshold function 
from the script LinearNeuron.m. The results are in Figure 1.6. 


Figure 1.5: A one-input neural net. The weight w is 2 and the bias bis 3. 


CHAPTER 1 M WHAT IS DEEP LEARNING? 


Figure 1.6: The “linear” neuron compared to other activation functions from LinearNeuron. 


Linear Neuron 
T 


8 T 
Tanh 
Exp 
— Threshold 
6 : - 
Linear 


LinearNeuron.m 


$$ Linear neuron demo 


x - linspace(-4,2,1000); 

y LAS le 

zi = tanh(y); 
2./(1+exp(-y)) - 1; 

z3 = zeros(1,length(x)); 


$ Apply a threshold 


So c] "Oy Mm FG p re 
N 
N 
ll 


k = Y sexu 
10  z3(k) = ils 
11 
qx ues: AY O li a s s 
13 ya pu3dqurseetutlesiumeacsNeurtons (plot title tan ear Neuroni... 
14 ‘plot set',([1 2 3 4]],'legend',[('Tanh','Exp','Threshold','Linear']]); 


The tanh and exp are very similar. They put bounds on the output. Within the range 
—3 € x < 1, they return the function of the input. Outside those bounds, they return the sign of 
the input, that is, they saturate. The threshold function returns zero if the value is less than 0 and 
1 if it is greater than -1.5. The threshold is saying the output is only important, thus activated, 
if the input exceeds a given value. The other nonlinear activation functions are saying that we 
care about the value of the linear equation only within the bounds. The nonlinear functions (but 
not step) make it easier for the learning algorithms since the functions have derivatives. The 
binary step has a discontinuity at an input of zero so that its derivative is infinite at that point. 
Aside from the linear function (which is usually used on output neurons), the neurons are just 
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telling us that the sign of the linear equation is all we care about. The activation function is 
what makes a neuron a neuron. 

We now show two brief examples of neural nets: first, a daylight detector, and second, the 
exclusive-or problem. 


1.3.1 Daylight Detector 


Problem 
We want to use a simple neural net to detect daylight. This will provide an example of using a 
neural net for classification. 


Solution 

Historically, the first neuron was the perceptron. This is a neuron with an activation function 
that is a threshold. Its output is either O or 1. This is not really useful for man real-world 
problems. However, it is well suited for simple classification problems. We will use a single 
perceptron in this example. 


How It Works 


Suppose our input is a light level measured by a photo cell. If you weight the input so that 1 is 
the value defining the brightness level at twilight, you get a sunny day detector. 

This is shown in the following script, SunnyDay. The script is named after the famous 
neural net that was supposed to detect tanks but instead detected sunny days; this was due to 
all the training photos of tanks being taken, unknowingly, on a sunny day, while all the photos 
without tanks were taken on a cloudy day. The solar flux is modeled using a cosine and scaled 
so that it is 1 at noon. Any value greater than 0 is daylight. 


SunnyDay.m 
$$ The data 


1 
2 t - linspace(0,24); $ time, in hours 

3 d = zeros(1,length(t)); 

4 s = cos((2*pi/24)*(t-12)); % solar flux model 

5 

6 $$ The activation function 

7 $ The nonlinear activation function which is a threshold detector 
Se aa) = 8-2 Op 

IESO" 

10 j S E S 

mo ela) S ale 


13 $$ Plot the results 


14S Plo ES eter et eel o SiO VT y een. 

15 ('Solar Flux’, 'Day/Night'), 'figure title','Daylight Detector',... 

16 “plot title’, {’Flux Model','Perceptron Output']); 

17 set([subplot(2,1,1) subplot(2,1,2)],’xlim’, [0 24],’xtick’,[0 6 12 18 24]); 
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Figure 1.7: The daylight detector. The top plot shows the input data, and the bottom plot shows 
the perceptron output detecting daylight. 


Flux Model 


Solar Flux 


0 6 12 18 24 
Hour 
Figure 1.7 shows the detector results. The set (gca, ...) code sets the x-axis ticks to 


end at exactly 24 hours. This is a really trivial example but does show how classification works. 
If we had multiple neurons with thresholds set to detect sunlight levels within bands of solar 
flux, we would have a neural net sun clock. 


1.3.2 XOR Neural Net 


Problem 
We want to implement the exclusive-or (XOR) problem with a neural network. 


Solution 

The XOR problem impeded the development of neural networks for a long time before **deep 
learning”? was developed. Look at Figure 1.8. The table on the left gives all possible inputs 
A and B and the desired outputs C. *“Exclusive-or”” just means that if the inputs A and B are 
different, the output C is 1. The figure shows a single-layer network and a multilayer network, 
as in Figure 1.1, but with the weights labeled as they will be in the code. You can implement 
this in MATLAB easily, in just seven lines: 
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Figure 1.8: Exclusive-or (XOR) truth table and possible solution networks. 


C = XOR(A,B) 
A B [e 
0 0 0 
0 1 1 
I. IE e 
1 1 0 
Truth Table Single-layer network Multilayer “deep” network 
>> (es dL 
>> else 
>> el =) 10) 
>> end 


This type of logic was embodied in medium-scale integrated circuits in the early days of digital 
systems and in tube-based computers even earlier than that. Try as you might, you cannot pick 
two weights and a bias on the single-layer network to reproduce the XOR. Minsky created a 
proof that it was impossible. 

The second neural net, the deep neural net, can reproduce the XOR. We will implement and 
train this network. 


How It Works 

What we will do is explicitly write out the back propagation algorithm that trains the neural net 
from the four training sets given in Figure 1.8, that is, (0,0), (1,0), (0,1), (1,1). We'll write it in 
the script XORDemo. The point is to show you explicitly how back propagation works. We will 


use the tanh as the activation function in this example. The XOR function is given in XOR .m 
shown as follows. 


XOR.m 


E 


XOR Implement an “Exclusive Or’ neural net 
c = XOR(a,b,w) 
$ Description 
Implements an XOR function in a neural net. It accepts vector inputs. 
$ Inputs 
a (ED) Input 1 
ly (3, x) Input 2 
w (9,1) Weights and biases 
$ Outputs 
G (dps) — (Orso 


c 0 - DW RO M — 


ap oo do oo oo oo oo op op op op oo dp 
X 
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14 function [y3,y1,y2] = XOR(a,b,w) 


16 if( nargin < 1 ) 


17 Demo 
18 return 
19 end 


20 yl = tanh(w(1)«a + w(2)*b + w(7)); 


1 
$ 


2 y2 = tanh(w(3)*a + w(4)xb + w(8)); 
23 y3 = w(5)*yl + w(6)*y2 + w(9); 
Du. NESE 


There are three neurons. The activation function for the hidden layer is the hyperbolic 
tangent. The activation function for the output layer is linear. 


yj = tanh(uw;a + web + w7) (1.7) 
y2 = tanh(wsa + w46 + wg) (1.8) 
U3 = w5yı + wey2 + wo (1.9) 


Now we will derive the back propagation routine. The hyperbolic activation function is 


f(z) = tanh(z) (1.10) 
Its derivative is 
df (z) = 2 
7 1— f*(z) (1.11) 


In this derivation, we are going to use the chain rule. Assume that F' is a function of y which is 
a function of x. Then 

dF(y(z)) _ dF dy 

dx E dy de 


The error is the square of the difference between the desired output and the output. This is 
known as a quadratic error. It is easy to use because the derivative is simple and the error is 
always positive, making the lowest error the one closest to zero. 


(1.12) 


E = - (c — ys)? (1.13) 


NI = 


The derivative of the error for w; for the output node 


OE Oy3 
A es hee eee 1.14 
Bu; (ys — c) Bw; (1.14) 
For the hidden nodes, it is 
OE Ong 
— = 3 1.15 
ou. V3 ou. (1.15) 


Expanding for all the weights 


where 


Vi 
V» 
Ya 
n1 
na 


n3 
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ðE 

ÓN V3V1a 
OE 

San Da 4V31b 
ðE 

w papza 
ðE 

LEE V/3t»b 
OE 

Dus = ayi 
OE 

Em = W342 
OE 

Bue m papi 
ðE 

T papo 
ðE 

Doo 7 V/3 
= 1- f'(n) 
= 1- f'(nj) 
= ya—C 

= wa + wb 
=  wsa + w4b 


W5Y1 + wey2 + wo 


w7 


We 


(1.16) 
(1.17) 
(1.18) 
(1.19) 
(1.20) 
(1.21) 
(1.22) 
(1.23) 


(1.24) 


(1.25) 
(1.26) 
(1.27) 
(1.28) 
(1.29) 
(1.30) 


You can see from the derivation how this could be made recursive and apply to any number of 


outputs or layers. Our weight adjustment at each step will be 


Aw; = 


OE 


ry 


(1.31) 


where 7 is the update gain. It should be a small number. We only have four sets of inputs. We 
will apply them multiple times to get the XOR weights. 
Our back propagation trainer needs to find the nine elements of w. The training function 


XORTraining.mis shown as follows. 
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XORTraining.m 

1 $$ XORTRAINING Implements an XOR training function. 
2 $$ Inputs 

D A TN MAD) Bares, il 

UMS a WAY Beyer 2 

G x 19 1,4) Output 

6 $ w 9,1) Weights and biases 

y Xp om 1,1) Number of iterations through all 4 inputs 
8 $ eta 1,1) Training weight 

9 & 

10 $$ Outputs 

u 3% w 9,1) Weights and biases 

12 $$ See also 

13 % XOR 


15 function w = XORTraining(a,b,c,w,n,eta) 


17 if( nargin « 1 ) 


18 Demo; 

19 return 

20 end 

21 

2 e = zeros(4,1); 

23 y3 = XOR(a,b,w); 

2n. EE ES ote 

25 wP = zeros(10,n+1); % For plotting the weights 
26 fork = 1:n 

27 wP(:,k) = [w;mean(abs(e))]; 

28 for j = 1:4 

29 [y3,y1,y2] = XOR(a(j),b(j),w); 

30 psil = ih = al ae 

31 psi2 Sd VA 

32 e(3) E 

33 psi3 = e(j); % Linear activation function 


34 dw = psi3«[psil«a(j);psil«b(j);psi2«a(j);psi2xb(j);yl;y2; 
psil;psi2;1]; 


35 w = w - etaxdW; 
36 end 

37 end 

38 wP(:,k+1) = [w;mean(abs(e))]; 


40 $ For legend entries 


4 wName - string; 

42 for k = 1:length(w) 

43 wName (k) = sprintf('W $d',k); 
44 end 

45 leg{1} = wName; 

460 lequi2) =) ae; 


48 PlotSet(0:n,wP,'x label','step','y label',['Weight' 'Error'],... 
49 'figure title','Learning','legend',leg,'plot set’,{1:9 10}); 
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The first two arguments to PlotSet are the data and are the minimum required. The 
remainder are parameter pairs. The leg value has legends for the two plots, as defined by 
'plot set'. The first plot uses the first nine data points, in this case, the weights. The 
second plot uses the last data point, the mean of the error. 1eg is a cell array with two strings 
or string arrays. The “plot set’ is two arrays in a cell. A plot with only one value will not 
generate a legend. 

The demo script XORDemo . m starts with the training data, which is actually the complete 
truth data for this simple function, and randomly generated weights. It iterates through the 
inputs 25000 times, with a training weight of 0.001. 


XORDemo.m 
i $ Training data - also the truth data 
DNE = 110.4 O 1 
3.35 = 0 14 
x = (10. 3b e 
5 
6 $ First try implementing random weights 
7 wO SOS DA O OA lO ANIKG 2 LO 
8 0.4133; -0.3476; 0.3258; 0.0383]; 
9 cR = XOR(a,b,w0); 
0 
1 fprintf('AnRandom Weights\n’ ) 
2 fprintf(' a b ENA DF 
GN for kE TA 
4 fprintf('$5.0f $5.0f %5.2f\n’,a(k),b(k),cR(k)); 
5 end 
6 
7 $ Now execute the training 
8 w = XORTraining(a,b,c,w0,25000,0.001); 
oc - XOR(a,b,w); 


The results of the neural network with random weights and biases, as expected, are not 
good. After training, the neural network reproduces the XOR problem very well, as shown in 
the following demo output. Now, if you change the initial weights and biases, you may find 
that you get bad results. This is because the simple gradient method implemented here can fall 
into local minima from which it can't escape. This is an important point about finding the best 
answer. There may be many good answers, which are local optimals, but there will be only one 
best answer. There is a vast body of research on how to guarantee that a solution is a global 
optimal. 


>> XORDemo 


1 
2 

3 Random Weights 

4 a D le: 

5 0 OQ (0/2928) 
6 all OIN 1) 
7 0 AO OS 
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8 1 1 -0.04 


10 Weights and Biases 


11 Initial Final 
12 0c918929 9177/9339 
13 02482m]. 8155 
14 -0.0495 -0.8535 
15 -0.4162 -0.8591 
16 SOIT 3724 
17 0.4133 1.4893 
18 -0.3476 -0.4974 
19 03258 eI 24 
20 0.0383 -0.5634 


2  Trained 


23 a b c 

24 0 O. 50,00 
25 dl 0 1.00 
26 0 L 12500 
27 i L DEO 


Figure 1.9 shows the weights and biases converging and also shows the mean output error 
over all four inputs in the truth table going to zero. If you try other starting weights and biases, 
this may not be the case. Other solution methods, such as Genetic Algorithms [13], Electro- 
magnetism based [4], and Simulated Annealing [23], are less susceptible to falling into local 
minima but can be slow. A good overview into optimization specifically for machine learning 
is given by Bottou [6]. 

In the next chapter, we will use the deep learning toolbox to solve this problem. 

You might wonder how this compares to a set of linear equations. If we remove the activa- 
tion functions, we get 


Y3 = wg + W6Wg + 0507 + a(wiws + wawg) + b(waws + 2,wg) (1.32) 
This reduces to just three independent coefficients. 
Y3 = kı + koa + k3b (1.33) 


One is a constant, and the other two multiply the inputs. Writing the four possible cases in 
matrix notation, we get 


= ki + | ha | (1.34) 


Orro 
| 
RR00o0 


Cor OF 
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Figure 1.9: Evolution of weights during exclusive-or (XOR) training. 


2 T T 
— W, 
15+ wo 
Te — —— Way [4 
= — w 
D 4 
o 0 5 w, - 
= 
0 ——W, 4 
— w 
[ Y 
0.5 w, 
-1 l i l l Wa 
0 0.5 1 1.5 2 2.5 
step x 104 
0.6 T T 
0.5 |» = 
0.4 + - 
20.3) J 
W 
0.2 + J 
0.1 + 4 
0 1 1 1 1 
0 0.5 1 1.5 2 2.5 
step x104 
We can get close to a working XOR if we choose 
ky 1 
ka | = | —1 (1.35) 
ka -1 


This makes three out of four equations correct. There is no way to make all four correct with just 
three coefficients. The activation functions separate the coefficients and allow us to reproduce 
the XOR. This is not surprising because the XOR is not a linear problem. 


1.4 Deep Learning and Data 


Deep learning systems operate on data. Data may be organized in many ways. For example, 
we may want a deep learning system to identify an image. A color image that is 2 pixels by 2 
pixels by 3 colors could be represented with random data using rand. 


>> X = rand(2,2,3) 
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5 (0139157772; 0.8003 
6 0.4854 0.1419 
8 

Gr She, Bue) S 

10 

11 0.4218 E922 
ip SIS, 029595. 
14 

ls Se do ee) cc 

16 

itg 076557. 0.8491 
18 07203257 0.9340 


The array form implies a structure for the data. The same number of points could be orga- 
nized into a single vector using reshape. 


>> reshape (x,12,1) 
ans = 


A95 
.4854 
.8003 
145159, 
.4218 
SOS 
922 
39595 
76557 
-70357 
.8491 
.9340 


ey Xe) ER COR GONE COME COME E AE te c ER CORE GO) 


The numbers are the same, they are just organized differently. Convolutional neural net- 
works, described in the next section, are often used for image structured data. We might also 
have a vector: 


i ess rana (2715) 
2 

F ES 

4 

S 0.6787 

6 (OAS v7) 


for which we wish to learn a temporal or time sequence. In this case, if each column is a 
time sample, we might have 


1 >> rand(2,4) 
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5 0.7431 OSOS 0.7060 0:22:76)9 
6 0.3922 OF a7 Ae? 0.0318 0.0462 


For example, we might want to look at an ongoing sequence of samples and determine if 
a set of k samples matches a predetermined sequence. For this simple problem, the neural net 
would learn the sequence and then be fed sets of four samples to match. 

We also need to distinguish what we mean by matching. If all of our numbers are exact, the 
problem is relatively straightforward. In real systems, measurements are often noisy. In those 
cases, we want to match with a certain probability. This leads to the concept of statistical neural 
nets. 


1.5 Types of Deep Learning 


There are many types of deep learning networks. New types are under development as you read 
this book. One deep learning researcher joked that if you randomly put together four letters, 
you will have the name for an existing deep learning algorithm. 

The following sections briefly describe some of the major types. 


1.5.1 Multilayer Neural Network 
Multilayer neural networks have 


1. Input neurons 
2. Multiple layers of hidden neurons 


3. Output neurons 


The different layers may have different activation functions. They may also be functionally 
different such as being a convolution or pooling layer. In a later chapter, we will introduce the 
idea of an algorithmic layer. 


1.5.2 Convolutional Neural Networks (CNN) 


A CNN has convolutional layers (hence the name). It convolves a feature with the input matrix 
so that the output emphasizes that feature. This effectively finds patterns. For example, you 
might convolve an L pattern with the incoming data to find corners. The human eye has edge 
detectors, making the human vision system a convolutional neural network of sorts. 


1.5.3 Recurrent Neural Network (RNN) 


Recurrent neural networks are a type of recursive neural network. Recurrent neural networks 
are used for time-dependent problems. They combine the last time step’s data with the data 
from the hidden or intermediate layer, to produce a representation of the current time step. A 
recurrent neural net has a loop. An input vector at time k is used to create an output which 
is then passed to the next element of the network. This is done recursively in that each stage 
is identical with external inputs and inputs from the previous stage. Recurrent neural nets are 
used in speech recognition, language translation, and many other applications. One can see 
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how a recurrent network would be useful in translation in that the meaning of the latter part of 
an English sentence can be dependent of the beginning. Now this presents a problem. Suppose 
we are translating a paragraph. Is the output of the first stage necessarily relevant to the 100th 
stage? Maybe not. In standard estimation, old data is forgotten using a forgetting factor. In 
neural nets, we use long short-term memory networks, or LSTM networks. 


1.5.4 Long Short-Term Memory Networks (LSTMs) 


LSTMs are designed to avoid the dependency on old information. A standard RNN has a 
repeating structure. An LSTM also has a repeating structure, but each element has four layers. 
The LSTM layers decide what old information to pass on to the next layer. It may be all, or it 
may be none. There are many variants on LSTM, but they all include the fundamental ability 
to forget things. 


1.5.5 Recursive Neural Network 


This is often confused with recurrent neural networks (RNNs), which are a type of recursive 
neural network. Recursive neural networks operate on structured data. They’ve been used 
successfully on language processing as language is structured (as opposed to images which 
are not). 


1.5.6 Temporal Convolutional Machines (TCMs) 


The TCM is a convolutional architecture designed to learn temporal sequences [19]. TCMs 
are particularly useful for statistical modeling of temporal sequences. Statistical modeling is 
appropriate when incoming data is noisy. 


1.5.7 Stacked Autoencoders 


A stacked autoencoder is a neural net made up of a series of sparse autoencoders. An au- 
toencoder is a type of neural network that is an unsupervised learning algorithm using back 
propagation. Sparsity is a measure of how many neurons are activated, that is, have inputs that 
cause it to produce an output for a given activation function. The outputs of one layer feed into 
the next. The number of nodes tends to decrease as you move from input to output. 


1.5.8 Extreme Learning Machine (ELM) 


ELMs were invented by Guang-Bin Huang [15]. ELMs are a single hidden layer feedforward 
network. It randomly chooses the weights of the hidden nodes and analytically computes the 
weights of the output nodes. ELMs provide good performance and learn quickly. 


1.5.9 Recursive Deep Learning 


Recursive deep learning [28] is a variation and extension of RNNs. The same set of neural node 
weight is applied recursively over a structured input. That is, not all of the inputs are processed 
in batch. Recursion is a standard method used in general estimation when data is coming in at 
different times and you want the best estimate at the current time without having to process all 
available data at once. 
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1.5.10 Generative Deep Learning 


Generative deep learning allows a neural network to learn patterns [12] and then create com- 
pletely new material. A generative deep learning network can create articles, paintings, pho- 
tographs, and many other types of material. 


1.6 Applications of Deep Learning 


Deep learning is used in many applications today. Here are a few. 


Image recognition —This is arguably the best known and most controversial use of deep learn- 
ing. A deep learning system is trained with pictures of people. Cameras are distributed ev- 
erywhere and images captured. The system then identifies individual faces and matches 
them against its trained database. Even with variations of lighting, weather conditions, 
and clothing, the system can identify the people in the images. 


Speech recognition —You hardly ever get a human being on the phone anymore. You are first 
presented with a robotic listener that can identify what you are saying, at least within the 
limited context of what it expects. When a human listens to another human, the listener 
is not just recording the speech, he or she is guessing what the person is going to say and 
filling in gaps of garbled words and confusing grammar. Robotic listeners have some of 
the same abilities. A robotic listener is an embodiment of the *“Turing test." Did you 
ever get one that you thought was a human being? Or for that matter, did you ever reach 
a human who you thought was a robot? 


Handwriting analysis —A long time ago, you would get forms in which you had boxes in 
which to write numbers and letters. At first they had to be block capitals! A robotic hand- 
writing system could figure out the letters in those boxes reliably. Years later, though 
many years ago, the US Post Office introduced zip code reading systems. At first you 
had to put the zip code on a specific part of the envelope. That system has evolved so 
that it can find zip codes anywhere. This made the zip + 4 system really valuable and a 
big productivity boost. 


Machine translation —Google translate does a pretty good job considering it can translate 
almost any language in the world. It is an example of a system with online training. You 
see that when you type in a phrase and the translation has a check mark next to it because 
a human being has indicated that it is correct. Figure 1.10 gives an example. Google 
harnesses the services of free human translators to improve its product! 


Targeting —By targeting we mean figuring out what you want. This may be a movie, a 
clothing item, or a book. Deep learning systems collect information on what you like and 
decide what you would be most interested in buying. Figure 1.11 gives an example. This 
is from a couple of years ago. Perhaps ballet dancers like Star Wars! 


Other applications include game playing, autonomous driving, medicine, and many others. Just 
about any human activity can be an application of deep learning. 
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Figure 1.10: Translation from Japanese into English. 


= Google Translate 


Ba Text BB Documents 


DETECT LANGUAGE JAPANESE ENGLISH SPANISH v = ENGUSH JAPANESE 
č < x listen Y 

Kiku 

0 umm 7; - D 


Figure 1.11: Prediction of your buying patterns. 


amazon.com 


Recommended for You 
Men's Darth Vader Costume With Cape, 
Belt And Mask 


Rubie's Costume Co (August 25, 2005) 
Price: $6.95 - $49.99 


(Geeiallbuying options) (Add to Wish st ) 
Because you purchased... 


Ballet Is Fun-Turnboard,TB1,multi- 
colored,One-Size 
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1.7 Organization of the Book 


This book is organized around specific deep learning examples. You can jump into any chapter 
as they are pretty much independent. We've tried to present a wide range of topics, some 
of which, hopefully, align with your work or interests. The next chapter gives an overview of 
MATLAB products for deep learning. We only use three of their toolboxes in this book, besides 
the core MATLAB development environment. 

Each chapter except for the first and second is organized in the following order: 


1. Modeling 

2. Building the system 
3. Training the system 
4. Testing the system 


Training and testing are often in the same script. Modeling varies with each chapter. For 
physical problems, we derive numerical models, usually sets of differential equations, and build 
simulations of the processes. 

The chapters in this book present a range of relatively simple examples to help you learn 
more about deep learning and its applications. It will also help you learn the limitations of deep 
learning and areas for future research. All use the MATLAB deep learning toolbox. 


1. What Is Deep Learning? (this chapter). 


2. MATLAB Machine and Deep Learning Toolboxes—This chapter gives you an intro- 
duction to MATLAB machine intelligence toolboxes. We'll be using three of the tool- 
boxes in this book. 


3. Finding Circles with Deep Learning— This is an elementary example. The system will 
try to figure out if a figure is a circle. It will be presented with circles, ellipses, and other 
objects and trained to determine which are circles. 


4. Classifying Movies—All movie databases try to guess what movies will be of most 
interest to their viewers to speed movie selection and reduce the number of disgruntled 
customers. This example creates a movie rating system and attempts to classify movies 
in the movie database as good or bad. 


5. Algorithmic Deep Learning— This is an example of fault detection using a detection 
filter as an element of the deep learning system. It uses a custom deep learning algorithm, 
the only example that does not use the MATLAB deep learning toolbox. 


6. Tokamak Disruption Detection—Disruptions are a major problem with a nuclear fusion 
device known as a Tokamak. Researchers are using neural nets to detect disruptions 
before they happen so that they can be stopped. In this example, we use a simplified 
dynamical model, to demonstrate deep learning. 
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7. Classifying a Ballet Dancer’s Pirouette—This example demonstrates how to use real- 
time data in a deep learning system. IT uses IMU data via Bluetooth, and a camera input. 
The data is combined to classify a dancer’s pirouette. This example will also cover data 
acquisition and use real-time data as part of a deep learning system. 


8. Completing Sentences—Writing systems sometimes attempt to predict what word or 
sentence fragment you are trying to use. We create a database of sentences and try to 
predict the remainder as soon as possible. 


9. Terrain-Based Navigation— The first cruise missiles used terrain mapping to reach their 
targets. This has been largely replaced by GPS. This system will identify where an air- 
craft is on a map and use past positions to predict future positions. 


10. Stock Prediction—Who wouldn't want a system that could create portfolios that would 
beat an index fund? Perhaps a stock prediction system that would find the next Apple or 
Microsoft! In this example, we create an artificial stock market and train the system to 
identify the best stocks. 


11. Image Classification— Training deep learning networks can take weeks. This chapter 
gives you an example of using a pretrained network. 


12. Orbit Determination—Orbits can be determined using only angle measurements. This 
chapter shows how fitnet can produce estimates of semi-major axis and eccentricity 
from angles. 


These are all very different problems. We give a brief summary of the theory behind each 
which is hopefully enough for you to understand the problem. There are hundreds of papers 
on each topic, and even textbooks on these subjects. The references provide more informa- 
tion. There are two broad methods for applying deep learning in MATLAB. One is using 
trainNetwork and the other is using the various feedforward functions. Table 1.1 summa- 
rizes which methods are used in each chapter. 

Chapter 11 uses pretrained networks, but these are similar to those produced by 
trainNetwork. Chapter 12 applies four types of network training to the same problem. 

For each problem, we are creating a world in which to work. For example, in the classifying 
movies problem, we create a world of movies and viewers based on a particular model that we 
create. This is akin to the famous “‘Blocks World" in which a world of colored blocks was 
created. The artificial intelligence engine could reason and solve problems of stacking blocks 
within the context of this world. Much like **Blocks World" did not map into general reasoning, 
we do not claim that our code can be applied directly to real-world problems. 

In each chapter, we will present a problem and give code that creates a deep learning net- 
work to solve the problem. We will show you the performance of the code and where it doesn't 
work as well as we would like. Deep learning is a work in progress, and it is important to un- 
derstand what works and what doesn't work. We encourage our readers to go beyond the code 
in the book and see if they can improve on its performance. 
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Table 1.1: Deep learning methods. The specific form of the network is shown. The last column 
shows the application. 


Feed Forward 
[2  [feedformrdnet | Regresión — 
[3 | — —  ——[9emeomi Image classification — 
[4  patermet O | Cascos S 
[5  [feedforwardnet —  — | —  — [Regresi | 
[6$ | . . . [BieodLSTM  |Chssfiaion — 
3 C [Bidirecional LSTM 
[8$ | — — —  — [BieondLSTM | Classification — 
A Convotutionar | Image classification — 
[STI ETT 


o o NENA E O 
12 feedforwardnet, fitnet, Bidirectional LSTM Regression 
cascadeforwardnet 


We present much of the code in segments. Unless specified, you cannot cut and paste the 
code into the MATLAB command window and get a result. You should run the demos from the 
code base that is included with the book. Remember also that you will need the Deep Learning 
Toolbox and Instrument Control Toolbox for Chapter 7. The other chapters only require core 
MATLAB and the Deep Learning Toolbox. 

The code in this book was developing using MATLAB 2019a on a Macintosh MacBook Pro 
under MacOS 10.14.4. The code should work on all other operating systems though processing 
time may vary. 
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CHAPTER 2 
Ee 


MATLAB Machine Learning 
Toolboxes 


2.1 Commercial MATLAB Software 
2.1.1 MathWorks Products 


The MathWorks sells several packages for machine learning. Their toolboxes work directly 
with MATLAB and Simulink. The MathWorks products provide high-quality algorithms for 
data analysis along with graphics tools to visualize the data. Visualization tools are a critical 
part of any machine learning system. They can be used for data acquisition, for example, for 
image recognition or as part of systems for autonomous control of vehicles, or for diagnosis and 
debugging during development. All of these packages can be integrated with each other and 
with other MATLAB functions to produce powerful systems for machine learning. The most 
applicable toolboxes that we will discuss are listed in the following; we will use only the deep 
learning and the Instrument Control toolboxes in this book. 


e Deep Learning Toolbox 

e Instrument Control Toolbox 

e Statistics and Machine Learning Toolbox 
e Neural Network Toolbox 

e Computer Vision System Toolbox 

e Image Acquisition Toolbox 

e Parallel Computing Toolbox 

e Text Analytics Toolbox 


The breadth of MATLAB and Simulink products allow you to explore every facet of ma- 
chine learning and to connect with other areas of data science including controls, estimation, 
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and simulation. There are also many domain-specific toolboxes, such as the Automated Driving 
Toolbox and Sensor Fusion and Tracking Toolbox that can be used with the learning products. 


Deep Learning Toolbox 


The deep learning toolbox allows you to design, build, and visualize convolutional neural net- 
works. You can implement existing, pretrained neural networks available on the Web, such as 
GoogLeNet, VGG-16, VGG_19, AlexNet, and ResNet-59. GoogLeNet and AlexNet are image 
classification networks and are discussed in Chapter 11. The deep learning toolbox has exten- 
sive capabilities for visualization and debugging of neural networks. The debugging tools are 
important to ensure that your system is behaving properly and help you to understand what is 
going on inside your neural network. It includes a number of pretrained models. We will use 
this toolbox in all of our examples. 


Instrument Control Toolbox 


The MATLAB Instrument Control Toolbox is designed to directly connect instruments. This 
simplifies the use of MATLAB with hardware. Examples include oscilloscopes, function gen- 
erators, and power supplies. The toolbox provides support for TCP/IP, UDP, I2C, SPI, and 
Bluetooth. With the Instrument Control Toolbox, you can integrate MATLAB directly into 
your laboratory workflow without the need for writing drivers or creating specialized MEX 
files. We use the Bluetooth functionality with an IMU in this book. 


Statistics and Machine Learning Toolbox 


The statistics and machine learning toolbox provides data analytics methods for gathering trends 
and patterns from massive amounts of data. These methods do not require a model for analyzing 
the data. The toolbox functions can be broadly divided into classification tools, regression tools, 
and clustering tools. Statistics are the foundation for much of deep learning. 

Classification methods are used to place data into different categories. For example, data, in 
the form of an image, might be used to classify an image of an organ as having a tumor. Classifi- 
cation is used for handwriting recognition, credit scoring, and face identification. Classification 
methods include support vector machines (SVM), decision trees, and neural networks. 

Regression methods let you build models from current data to predict future data. The 
models can then be updated as new data becomes available. If the data is only used once to 
create the model, then it is a batch method. A regression method that incorporates data as it 
becomes available is a recursive method. 

Clustering finds natural groupings in data. Object recognition is an application of clustering 
methods. For example, if you want to find a car in an image, you look for data that is associated 
with the part of an image that is a car. While cars are of different shapes and sizes, they have 
many features in common. Clustering can also deal with different orientations and scalings. 

The toolbox has many functions to support these areas and many that do not fit neatly into 
these categories. The statistics and machine learning toolbox provides professional tools that 
are seamlessly integrated into the MATLAB environment. 
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Computer Vision System Toolbox 

The MATLAB Computer Vision System Toolbox provides functions for developing computer 
vision systems. The toolbox provides extensive support for video processing. It includes func- 
tions for feature detection and extraction. Prior to the extensive use of deep learning, feature 
detection was the approach for image identification. It also supports 3D vision and can process 
information from stereo cameras. 3D motion detection is supported. 


Image Acquisition Toolbox 
The MATLAB Image Acquisition Toolbox provides functions for connecting cameras directly 
into MATLAB without the need for intermediary software or using the apps that come with 
many cameras. You can use the package to interact with the sensors directly. Foreground and 
background acquisition is supported. The toolbox supports all major standards and hardware 
vendors. It makes it easier to design deep learning image processing software using real data. It 
allows control of cameras as is shown in the chapter on using images as part of deep learning. 
The Image Acquisition Toolbox supports USB3 Vision, GigE Vision, and GenICam GenTL. 
You can connect to Velodyne LiDAR@)sensors, machine vision cameras, and frame grabbers, 
as well as high-end scientific and industrial devices. USB3 gives you considerable control over 
the camera and is used in Chapter 7. 


Parallel Computing Toolbox 

The Parallel Computing Toolbox allows you to use multicore processors, graphical processing 
units (GPUs), and computer clusters with your MATLAB software. It allows you to easily 
parallelize algorithms using high-level programming constructs like parallel for loops. Some 
functions in the deep learning toolbox can take advantage of GPUs and parallel processing. 
There is an example of potential GPU use in Chapter 10. As almost every personal computer 
has a GPU, this can be a worthwhile addition to your MATLAB software. 


Text Analytics Toolbox 

Text Analytics Toolbox provides algorithms and visualizations for working with text data. Mod- 
els created with the toolbox can be used in applications such as sentiment analysis, predictive 
maintenance, and topic modeling. The toolbox includes tools for processing raw text from 
many sources. You can extract individual words, convert text into numerical representations, 
and build statistical models. This is a useful adjunct to deep learning. 


2.2 MATLAB Open Source 


MATLAB open source tools are a great resource for implementing machine learning. Machine 
learning and convex optimization packages are available. Universities are constantly producing 
new neural network toolsets. Much work is done in Python, but MATLAB is a very popular 
base for software development and AI work. 
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2.2.1 Deep Learn Toolbox 


The Deep Learn Toolbox by Rasmus Berg Palm is a MATLAB toolbox for deep learning. It 
includes Deep Belief Nets, Stacked Autoencoders, Convolutional neural nets, and other neural 
net functions. It is available through the MathWorks File Exchange. 


2.2.2 Deep Neural Network 


The Deep Neural Network by Masayuki Tanaka provides deep learning tools of deep belief 
networks of stacked restricted Boltzmann machines. It has functionality for both unsupervised 
and supervised learning. It is available through the MathWorks File Exchange. 


2.2.3 MatConvNet 


MatConvNet implements Convolutional Neural Networks for image processing. It includes a 
range of pretrained networks for image processing functions. You can find it by searching on 
the name or at the time of printing at www.vlfeat.org/matconvnet/. This package is open source 
and is open to contributors. 


2.2.4 Pattern Recognition and Machine Learning Toolbox (PRMLT) 


This toolbox implements the functionality of the book Pattern Recognition and Machine Learn- 
ing, by Christopher Bishop [5]. The book is an excellent reference, and the code makes it easy 
to use the algorithms discussed in the book. 


2.3 XOR Example 


We'll give many examples of the deep learning toolbox in subsequent chapters. We'll do one 
example just to get you going. This example doesn't even unlock a fraction of the power 
in the deep learning toolbox. We will implement the XOR example which we also did in 
Chapter 1. The DLXOR.m script is shown in the following, using the MATLAB functions 
feedforwardnet,configure,train,and sim. 


DLXOR.m 


$$ Use the Deep Learning Toolbox to create the XOR neural net 
$$ Create the network 

2 layers 

2 inputs 

dl (ojbhej oyohe 


1 
2 
3 
4 
5 
6 
7 
g net = feedforwardnet (2); 
9 


10 % XOR Truth table 


loa = (pb 0 b Oils 
B 15 = iL © o 3p 
B c = [0 © 3b ails 


15 $ How many sets of inputs 
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n = 130 


OF; 


$ This determines the number of inputs and outputs 


'NnHidden layer biases %6.3f %6.3f\n’,net.b{1}); 


%6.3f\n’,net.b{2}); 


"2 tS NR ne te LW lel. (le, 


E GNE aw worse SIM L(A, 3 


3) 


) 


.2£ $6.2£Nn' ,net.LW(2,1) (1, 


) 


) 8 
D 


n 


DES 


function %s\n’,net.layers{1}. 


19 X - zeros(2,n); 

20 y - zeros(1,n); 

21 

2 % Create training pairs 

pay fOr le = ilea 

24 j - randi([1,4]); 

s xXx = [a(j); 5G); 

26 y (k) - c(j); 

? end 

28 

29 net = configure(net, X, y); 

30 net.name = 'XOR'; 

3 net = traan (net, x, y); 

y E = sim(net, [a;b]); 

33 

34 fprintf ('\n a b (che) e 

3 for k = 1:4 

36 fprintf('$5.0f $5.0f %5.2f\n’,a(k),b(k),c(k)); 

3 end 

38 

39 % This only works for feedforwardnet (2); 

40 fprintf 

4 fprintf('Output layer bias 

42 fprintf('Input layer weights $6 

4  fprintf(' $6 

44 fprintf('Output layer weights $6 

45 

46 fprintf('Hidden layer activation 
transferFcn) ; 

47 fprintf('Output layer activation 


transferFcn) ; 


function %s\n’,net.layers{2}. 


Running the script produces the MATLAB GUI shown in Figure 2.1. 


As you can see, we have two inputs, one hidden layer and one output layer. The diagram 
indicates that our hidden layer activation function is nonlinear, while the output layer is linear. 
The GUI is interactive, and you can study the learning process by clicking the buttons. For 
example, if you click the performance button, you get Figure 2.2. Just about everything in 
the network development is customizable. The GUI is a real-time display. You can watch the 
training in progress. If you just want to look at the layout, type view (net). 

The three major boxes in the GUI are Algorithms, Progress, and Plots. Under Algorithms 


we have 


e Data division—Data division divides the data into training, testing, and validation sets. 
“Random” says that the division between the three categories is done randomly. 


e Training—This shows the training method to be used. 
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Figure 2.1: Deep learning network GUI. 


Neural Network Training (nntraintool) 


Neural Network 


Algorithms 
Data Division: Random (dividerand) 
Training: Levenberg-Marquardt (train!m) 
Performance: Mean Squared Error (mse) 
Calculations: MEX 
Progress 
Epoch: 0 1000 
Time: 0:00:00 | 
Performance: 0.504 RRE 0.00 
Gradient: 0.590 | ^ 8.97e-14 |] 1.00e-07 
Mu: 0.00100 1.00e-08 1.00e+10 
Validation Checks: 0 0 | 6 
Plots 
Performance (plotperform) 
Training State (plottrainstate) 
Error Histogram (ploterrhist) 
Regression (plotregression) 


Plot Interval: o 


1 epochs 


urne O anny nai nape nga gene ay 


Y Opening Regression Plot 


@ Stop Training Q Cancel 


e Performance— This says that mean squared error is used to determine how well the 
network works. Other methods, such as maximum absolute error, could be used. Mean 
squared is useful because the error grows as the square of the deviation meaning that 
large errors are more heavily weighted. 


e Calculations—This shows that the calculations are done via a MEX file, that is, in a C 


or C++ program. 
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Figure 2.2: Network training performance. 


O © @ Neural Network Training Performance (plotperform), Epoch 5, Minimum gradient... 


File 


Mean Squared Error (mse) 


Edit View Insert Tools Desktop Window Help 


10° 


=r 
c 
a 


100 


fone 


1022 


Best Validation Performance is 7.2064e-26 at epoch 5 


Train 
Validation 
Test 
Best 


1 


0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 


5 Epochs 


^W 


The Progress of the GUI is useful to watch during long training sessions. We are seeing it 
at the end. 


e Epoch— Says five epochs were used. The range is 0 to 1000 epochs. 


The last section is Plots. There are four figures we can study to understand the process. 


Time—Gives you the clock time during training. 


Performance—Shows you the MSE performance during training. 


Gradient—This shows the gradient that shows the speed of training as discussed earlier. 


Mu—Mu is the control parameter for training the neural network. 


Validation checks—Shows that no validation checks failed. 
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Figure 2.3: Network training state. 


O © @ Neural Network Training Training State (plottrainstate), Epoch 5, Minimum gradi... 


File Edit View Insert Tools Desktop Window Help a 
10° Gradient = 8.9658e-14, at epoch 5 
t 
9 
o 
s 
2 1010 + J 
Mu - 1e-08, at epoch 5 
T lA "—— 
3 10° 
E 
4 Validation Checks = 0, at epoch 5 
TT) AME  -——————— Pe MU DS. — a — ——X—X——— 
0.5 + 4 
3 
s 04 o > o + + 
-0.5 + 4 
-1 1 L 1 L 1 1 1 1 1 
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 


5 Epochs 


Figure 2.2 shows the training performance as a function of epoch. Mean squared error is 
the criteria. The test, validation, and training sets have their own lines. In this training, all have 
the same values. 

Figure 2.3 shows the training state as a function of epoch. Five epochs are used. The 
titles show the final values in each plot. The top plot shows the progression of the gradient. It 
decreases with each epoch. The next shows mu decreasing linearly with epoch. The bottom 
plot shows that there were no validation failures during the training. 

Figure 2.4 gives a training histogram. This shows the number of instances when one of 
the sets shows the error value on the x-axis. The bars are divided into training, validation, and 
test sets. Each number on the x-axis is a bin. Only three bins are occupied, in this case. The 
histogram shows that the training sets are more numerous than the testing or validation sets. 

Figure 2.5 gives a training regression. There are four subplots: one for training sets, one 
for validation sets, one for test sets, and one for all sets. There are only two targets, 0 and 1. 
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Figure 2.4: Network training histogram. 


O © @ Neural Network Training Error Histogram (ploterrhist), Epoch 5, Minimum gradie... 


File Edit View Insert Tools Desktop Window Help a 
Error Histogram with 20 Bins 
EH Training 
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Zero Error 
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50 
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Errors = Targets - Outputs 


The linear fit doesn't give much information in this case since we can only have a linear fit with 
two points. The plot title says we reached the minimum gradient after 5 epochs, that is, after 
passing all the cases through the training five times. The legend shows the data, the fit, and the 
Y=T plot which is the same as the linear in this system. 

Typing 

>> net - feedforwardnet (2); 

creates the neural network data structure which is quite flexible and complex. The **2” 
means two neurons in one layer. If we wanted two layers with two neurons each, we would 


type 


>> net = feedforwardnet([2 21); 


We create 600 training sets. net = configure(net, x, y); configures the net- 
work. The configure function determines the number of inputs and outputs from the x and 
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Figure 2.5: Regression. 


O ^ @ Neural Network Training Regression (plotregression), Epoch 5, Minimum gradient reached. 
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y arrays. The network is trained with net = train(net,x,y); and simulated with c 
= sim(net, [a;b]) ;. We extract the weights and biases from the cell arrays net . IW, 
net . LW, and net .b. “T” stands for input and *'IW" for layer. Input is from the single input 
node to the two hidden nodes, and layer is from the two hidden nodes to the one output node. 

Now the training sets are created randomly from the truth table. You can run this script 
many times, and usually you will get the right result, but not always. This is an example where 
it worked well. 
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>> DLXOR 
a b E 
1 OOO 
0 0 0.00 
d o L ON 
0 IO! 


Hidden layer biases 1.735 -1.906 
Output layer bias 198 
Input layer weights mob TAS 

Sis) 1.04 
Output layer weights -1.16 SO 
Hidden layer activation function tansig 
Hidden layer activation function purelin 


tansig is hyperbolic tangent. 
Each run will result in different weights, even when the network gives the correct results. 


For example: 


1 > DLXOR 

2 

3 a b c 

4 al L oo 

5 0 (0) 10) 1010) 

6 dl 0 1.00 

7 0 E E90) 

8 

9 Hidden layer biases 4.178 0.075 
10 Output layer bias -1.087 

11 Input layer weights 24-49 SO 
12 cep Wer. cepi) 


13 Output layer weights LESS. 25 


There are many possible sets of weights and biases because the weight/bias sets are not unique. 
Note that the 0.00 are really not O. This means that used operationally, we would need to set a 
threshold such as 


You might be interested in what happens if we add another layer to the network, by creating 
it with net = feedforwardnet([2 21). Figure 2.6 shows the network in the GUI. 

The additional hidden layer makes it easier for the neural net to fit the data from which it 
is learning. On the left are the two inputs, a and b. In each hidden layer, there is a weight w 
and bias b. Weights are always needed, but biases are not always used. Both hidden layers 
have nonlinear activation functions. The output layer produces the one output using a linear 
activation function. 
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Figure 2.6: Deep learning network GUI with two hidden layers. 
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Performance: Mean Squared Error (mse) 
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Progress 
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Time: 
Performance: 0.833 0.00 
Gradient: 1.11 | 2.59e-08 1.00e-07 
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Validation Checks: o[ od 
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b C 

Xi 0.00 
0 0.00 
0 1.00 
I 34300 


36 


CHAPTER 2 M MATLAB MACHINE LEARNING TOOLBOXES 


This produces good results too. We haven't explored all the diagnostic tools available 
when using feedforwardnet. There is a lot of flexibility in the software. You can change 
activation functions, change the number of hidden layers, and customize it in many different 
ways. This particular example is very simple as the input sets are limited to four possibilities. 

We can explore what happens when the inputs are noisy, not necessarily all ones or zeros. 
We do this in DLXORNoisy.m, and the only difference from the original script is in lines 
33--35 where we add Gaussian noise to the inputs. 


DLXORNoisy.m 
ra = a + 0.01xrandn(1,4); 
2 b = b + 0.01xrandn(1,4); 
3 c = sim(net, [a;b]); 


The output from running this script is shown as follows. 
>> DLXORNoisy 


a b E 
E991 QUES 10) J00)8 
0.001 -0.005 -0.002 
0.996 0.009 0.999 

-0.001 1.000 1.000 


Hidden layer biases -1.793 2.135 
Output layer bias MESS 
Input layer weights O al 

SEO ISZ 
Output layer weights -1.11 dba 355) 
Hidden layer activation function tansig 
Output layer activation function purelin 


As one might expect, the outputs are not exactly one or zero. 


2.4 Training 


The neural net is a nonlinear system due to the nonlinear activation functions. The Levenberg- 
Marquardt training algorithm is one way of solving a nonlinear least squares problem. This 
algorithm only finds a local minimum which may or may not be a global minimum. Other 
algorithms, such as genetic algorithms, downhill simplex, simulated annealing, and so on. could 
also be used for finding the weights and biases. To achieve second-order training speeds, one 
has to compute the Hessian matrix. The Hessian matrix is a square matrix of second-order 
partial derivative of a scalar-valued function. Suppose we have a nonlinear function 


f (21, 22) (2.1) 
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then the Hessian is 


Of Of 
da? 021029 
H = 82 l ET (2.2) 
022021 E 


£k are weights and biases. This can be very expensive to compute. In the Levenberg-Marquardt 
algorithm, we make an approximation 


H=J'J (2.3) 
where 
[^] à 
J=| $5 L| (2.4) 
The approximate Hessian is 
(oly ¡SEDE 
H= af by (AL (2.5) 
x1 Ox2 22 


ð. 
This is an approximation of the second derivative. The gradient is 
g=JTe (2.6) 


where e is a vector of errors. The Levenberg-Marquardt uses the following algorithm to update 
the weights and biases: 


rei = xk [ITJ + pI Te (2.7) 


I is the identity matrix (a matrix with all diagonal elements equal to 1). If the parameter y is 
zero, this is Newton’s method. With a large u, this becomes gradient descent which is faster. 
Thus p is a control parameter. After a successful step, we decrease yu since we are in less need 
of the advantages of the faster gradient descent. 

Why are gradients so important and why can they get us into trouble? Figure 2.7 shows 
a curve with a local and global minimum. If our search first enters the local minimum, the 
gradient is steep and will drive us to the bottom from which we might not get out. Thus would 
not have found the best solution. 

The cost can be very complex even for simple problems. A famous problem that can give 
insight into the problem is Zermelo’s problem which is discussed in Section 2.5. 


2.5  Zermelo's Problem 


Insight into issues of global optimization can be obtained by examining Zermelo’s problem 
[7]. Zermelo’s problem is a 2D trajectory problem of a vehicle at constant speed in a velocity 
field in which the velocity is a function of position, for example, a ship with a given maximum 
speed moving through strong currents. The magnitude and direction of the currents in each axis 
(u,v) are functions of the position: u(x, y) and v(x, y). The goal is to steer the ship to find a 
minimum time trajectory between two points. An analytical solution is possible for the case 
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Figure 2.7: Local and global minimums. 


Local minimum 


Global minimum 


where only the current in the x direction is nonzero, and it is a linear function of the y position 
of the vehicle. The equations of motion f are 


i = V cosO + u(z, y) = V cosd — VE (2.8) 


Y V sind + v(z, y) = V sind 
V is the velocity of the vehicle relative to the current which is constant, and @ is the angle of the 
vehicle relative to the x-axis and is the control in the problem. The problem has a characteristic 


dimension of h. The Hamiltonian of the system is 


H = Ay(V cos0 + u) + Ay(V sind + v) +1 (2.9) 
The solution of the optimal control problem requires the solution of the following equations: 
r= fud) (2.10) 
: OH 
Mt)? = -— 2.11 
(t) Q.11) 
OH 
— = 0 2.12 
Fu (2.12) 
If the final time is not constrained and the Hamiltonian is not an explicit function of time, then 
H(t)=0 (2.13) 
The costate equations are 
. ar^ 
À=- dX (2.14) 
Ox 
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where the boundary conditions of the partials with respect to the vector x are unknown. The 
optimality condition from (2.12) becomes 


0=2 A (2.15) 


where the subscript denotes the partial with respect to the control vector u which provides a 
relationship between the controls and the costates. 
Applying the costate equations ((2.14)), we first compute the partials of the state equations 


e es | D ern | (2.16) 


dc [0 0 
so applying the transpose to the partials matrix and substituting in, we then have the costate 


derivatives. 
: 0 0 Ax | 0 
i=- vm o] [x] 7 Let] em 


We then compute the partials of the state equations with respect to the control 


Of —V sind 
du | V cos 0 | Sed 
so that the control angle 0 can then be computed from the optimality condition in (2.15). 
Ax 
[| -V sinó Y cos0] |) | =0 (2.19) 
y 


tan = ^ 


The preceding equations are used for the indirect optimization method. We can also com- 
pute the analytical solution for this problem when the final position is the origin (0,0). The 
optimal control angle as a function of the current position is expressed as 


= sec — sechs (2.20) 


tan Of + sec Or 
tan 0 + sec 0 


(2.21) 


SIS >|2 


1 
> 3 [sec 0, (tan 0; — tan 0) — tan 0 (sec 0¢ — sec 0) + log 


where 6 is the final control angle and log is loge. These equations enable us to solve for the 
initial and final control angles 0, and 0; given an initial position. Note that this corrects a sign 
error in the text. 
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Figure 2.8: Zermelo’s problem cost. 


Table 2.1: Solutions. 


Analytical 1.866025 -0.26795 
Downhill simplex -0.65946 2.9404 -0.22428 


Simulated annealing -0.68652 2.4593 -0.27915 
Genetic algorithm -0.78899 2.9404 -0.26833 


The cost, while appearing simple, produces a very complex surface as shown in Figure 2.8. 
There are very flat regions and then a series of deep valleys. 

For each method being tested, the optimization parameters have been chosen, by a 
certain amount of trial and error, to get the best results. The final A vector is different in 
each case. However, the control is determined by the ratio so the magnitudes are not important. 
Table 2.1 gives the analytical and numerical solutions for the problem. The initial conditions 
are [3.66; —1.86] and the final conditions are [0; 0]. 
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Finding Circles with Deep Learning 


3.1 Introduction 


Finding circles is a classification problem. Given a bunch of geometric shapes, we want the 
deep learning system to classify a shape as either a circle or something else. This is much 
simpler than classifying faces or digits. It is a good way to determine how well your classifi- 
cation system works. We will apply a convolutional network to the problem as it is the most 
appropriate for classifying image data. 

In this chapter, we will first generate a set of image data. This will be a set of ellipses, a 
subset of which will be circles. Then we will build the neural net, using convolution, and train 
it to identify the circles. Finally, we will test the net and try some different options for training 
options and layer architecture. 


3.2 Structure 


The convolutional network consists of multiple layers. Each layer has a specific purpose. The 
layers may be repeated with different parameters as part of the convolutional network. The 
layer types we will use are 


1. imageInputLayer 
2. convolution2dLayer 
3. batchNormalizationLayer 
4. reluLayer 
5. maxPooling2dLayer 
6. fullyConnectedLayer 
7. softmaxLayer 
8. classificationLayer 
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You can have multiple layers of each type of layer. Some convolutional nets have hundreds 
of layers. Krizhevsky [1] and Bai [3] give guidelines for organizing the layers. Studying the 
loss in the training and validation can guide you to improving your neural network. 


3.2.1 imagelnputLayer 


This tells the network the size of the images. For example: 


1 layer = imageInputLayer([28 28 3]); 


says the image is RGB and 28 by 28 pixels. 


3.2.2 convolution2dLayer 


Convolution is the process of highlighting expected features in an image. This layer applies 
sliding convolutional filters to an image to extract features. You can specify the filters and the 
stride. Convolution is a matrix multiplication operation. You define the size of the matrices and 
their contents. For most images, like images of faces, you need multiple filters. Some types of 
filters are 


1. Blurring filter ones (3,3) /9 

2. Sharpening filter [0 -1 0;-1 5 -1;0 -1 0] 

3. Horizontal Sobel filter for edge detection [-1 -2 -1; 0 0 0; 1 2 1] 
4. Vertical Sobel filter for edge detection [-1 0 1;-2 0 2;-1 O 1] 


We create an n-by-n mask that we apply to an m-by-m matrix of data where m is greater 
than n. We start in the upper left corner of the matrix, as shown in Figure 3.1. We multiply 
the mask times the corresponding elements in the input matrix and do a double sum. That is 
the first element of the convolved output. We then move it column by column until the highest 
column of the mask is aligned with the highest column of the input matrix. We then return it 
to the first column and increment the row. We continue until we have traversed the entire input 
matrix and our mask is aligned with the maximum row and maximum column. 

The mask represents a feature. In effect we are seeing if the feature appears in different 
areas of the image. Here is an example. We have a 2 by 2 mask with an L. Convolution 
demonstrates convolution. 


Convolution.m 


3% Demonstrate convolution 


1 

2 

a ETTter a a pal all 

4 image = [00000 0; 
5 (0 (0) 10) © © Oy 
6 (0) (or al 0) (0) O 
7 (0. (0 al al (0. (hy 
8 0.00 0 10 101 
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Figure 3.1: Convolution process showing the mask at the beginning and end of the process. 


Input Matrix 


10 out = zeros(3,3); 
11 

2 for k = 1:4 

13 for j = 1:4 

14 Gr ex esl 

15 E s ges 

16 out(k,j) = sum(sum(filter.x*image(g,f))); 
17 end 

18 end 

19 

DON UE 


The 3 appears where the *'L" is in the image. 


>> Convolution 


filter = 
Al 0 
il dl 

image = 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 al 0 0 0 
0 0 1 dl 0 0 
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0 0 0 0 0 0 
QUES 

0 0 0 0 

0 dL dL 0 

0 1 3 1 

0 0 di 1 


We can have multiple masks. There is one bias and one weight for each element of the 
mask for each feature. In this case, the convolution works on the image itself. Convolutions 
can also be applied to the output of other convolutional layers or pooling layers. Pooling layers 
further condense the data. In deep learning, the masks are determined as part of the learning 
process. Each pixel in a mask has a weight and may have a bias; these are computed from 
the learning data. Convolution should be highlighting important features in the data. Subse- 
quent convolution layers narrow down features. The MATLAB function has two inputs: the 
filterSize, specifying the height and width of the filters as either a scalar or an array of [h 
w],and numFilters,the number of filters. 


3.2.3 batchNormalizationLayer 


A batch normalization layer normalizes each input channel across a mini-batch. It automatically 
divides up the input channel into batches. This reduces the sensitivity to the initialization. 


3.2.4 reluLayer 


reluLayer is a layer that uses the rectified linear unit activation function. 


xr x>=0 
Its derivative is 
df 1 x>=0 
dx { 0 «<0 (3.2) 


This is very fast to compute. It says that the neuron is only activated for positive values, and 
the activation is linear for any value greater than zero. You can adjust the activation point with 
a bias. This code snippet generates a plot of reluLayer: 


x = linspace(-8,8); 

YS 288 

y(y«0) = 0; 

PlotSetitx iym a SA PUE val cieli d ploc eitle 
reluLayer') 


Figure 3.2 shows the activation function. An alternative is a leaky reluLayer where the value is 
not zero below zero. Now the difference in the y computation in the snippet: 


x - linspace(-8,8); 
y-x 
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Figure 3.2: reluLayer. 


reluLayer 
T 


reluLayer 
ÉS 
T 
| 


-8 -6 -4 -2 0 2 4 6 8 
Input 


y(y«0) = 0.01x*x(y<0); 
PlotSet (x,y,'x label','Input','y label','reluLayer','plot title','leaky 
reluLayer') 


Figure 3.3 shows the leaky function. Below zero it has a slight slope. 

A leaky Relu layer solves the dead Relu problem where the network stops learning because 
the inputs to the activation problem are below zero, or whatever the threshold might be. It 
should let you worry a bit less on how you initialize the network. 


3.2.5 maxPooling2dLayer 


maxPooling2dLayer creates a layer that breaks the 2D input into rectangular pooling re- 
gions and outputs the maximum value of each region. The input poolSize specifies the width 
and height of a pooling region. poolSize can have one element (for square regions) or two 
for rectangular regions. This is a way to reduce the number of inputs that need to be evaluated. 
Typical images have to be more than a mega-pixel in size, and it is not practical to use all as 
inputs. Furthermore, most images, or two-dimensional entities of any sort, don't really have 
enough information to require finely divided regions. You can experiment with pooling and 
see how it works for your application. An alternative is averagePooling2dLayer. 
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Figure 3.3: Leaky reluLayer. 


leaky reluLayer 
T 


reluLayer 


3.2.6 fullyConnectedLayer 


The fully connected layer connects all of the inputs to the outputs with weights and biases. For 
example: 


1 layer = fullyConnectedLayer (10); 


creates ten outputs from any number of inputs. You don't have to specify the inputs. Effectively, 
this is the equation: 
y=ax+b (3.3) 


If there are m inputs and n outputs, b is a column bias matrix of length n and a is n by m. 


48 


CHAPTER 3 M FINDING CIRCLES 


3.2.7 softmaxLayer 


softmax finds a maximum of a set of values using the logistic function. The softmax is the 
maximum value of the set 


PER (3.4) 

SS Gp MA Dial 
q = 

E 2 3 4 J 2 3 
>> d = sum(exp(q)); 
>> p = exp(q)/d 
p= 

0.0236 0.0643 01747 0.4748 0.0230 0.0643 0.1747 


In this case, the maximum is element 4 in both cases. This is just a method of smoothing 
the inputs. Softmax is used for multiclass classification because it guarantees a well-behaved 
probability distribution. Well behaved means that the sum of the probabilities is 1. 


3.2.8 classificationLayer 


A classification layer computes the cross-entropy loss for multiclass classification problems 
with mutually exclusive classes. Let us define loss. Loss is the sum of the errors in training 
the neural net. It is not a percentage. For classification the loss is usually the negative log 
likelihood, which is 


L(y) = — log(y) (3.5) 


where y is the output of the softmax layer. 

For regression it is the residual sum of squares. A high loss means a bad fit. 

Cross-entropy loss means that an item being classified can only be in one class. The number 
of classes is inferred from the output of the previous layer. In this problem, we have only 
two classes, circle or ellipse, so the number of outputs of the previous layer must be 2. Cross- 
entropy is the distance between the original probability distribution and what the model believes 
it should be. It is defined as 


H(y, p) = — > yi log pi (3.6) 


where 7 is the index for the class. It is a widely used replacement for mean squared error. It is 
used in neural nets where softmax activations are in the output layer. 
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3.2.9 Structuring the Layers 


For our first net to identify circles, we will use the following set of layers. The first layer is the 
input layer, for the 32x32 images. These are relatively low-resolution images. You can visually 
determine which are ellipses or circles so we would expect the neural network to be able to do 
the same. Nonetheless, the size of the input images is an important consideration. In our case, 
our images are tightly cropped around the shape. In a more general problem, the subject of 
interest, a cat, for example, might be in a general setting. 

We use a convolution2dLayer, batchNormalizationLayer, and reluLayer 
in sequence, with a pool layer in between. There are three sets of convolution layers, each with an 
increasing number of filters. The output set of layers consists of a fullyConnectedLayer, 
softmaxLayer, and finally, the classificationLayer. 


EllipsesNeuralNet.m 


$ Define the layers for the net 

This gives the structure of the convolutional neural net 
layers - [ 

imageInputLayer(size(img)) 


p 
$ 
p 

$ 


convolution2dLayer(3,8,'Padding','same') 
batchNormalizationLayer 
reluLayer 


maxPooling2dLayer(2,'Stride',2) 
convolution2dLayer(3,16,'Padding','same') 
batchNormalizationLayer 

reluLayer 

maxPooling2dLayer(2,'Stride',2) 
convolution2dLayer(3,32,'Padding','same') 


batchNormalizationLayer 
reluLayer 


V 0 JO U Rot HF OCD MOA DN R WN BS 


N N N 
N = o 


fullyConnectedLayer (2) 
softmaxLayer 
classificationLayer 


Ihe 


H2 d 
A RO 
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3.3 Generating Data: Ellipses and Circles 
3.3.1 Problem 


We want to generate images of ellipses and circles of arbitrary sizes and with different thick- 
nesses in MATLAB. 


3.3.2 Solution 


Write a MATLAB function to draw circles and ellipses and extract image data from the figure 
window. Our function will create a set of ellipses and a fixed number of perfect circles as 
specified by the user. The actual plot and the resulting downsized image will both be shown in 
a figure window so you can track progress and verify that the images look as expected. 


3.3.3 How It Works 


This is implemented in GenerateEllipses.m. The output of the function is a cell array 
with both the ellipse data and a set of image data obtained from a MATLAB figure using 
getframe. The function also outputs the type of image, that is, the “truth” data. 


GenerateEllipses.m 
GENERATEELLIPSES Generate random ellipses 


oo oo 


Form 
[d, v] = GenerateEllipses(a,b,phi,t,n,nC,nP) 
$ Description 


Un p w N -— 
ap oo op oo oo 


Generates random ellipses given a range of sizes and max rotation. 
The number 
of ellipses and circles must be specified; the total number generated 


a 
oo 


is their 

7 % sum. Opens a figure which displays the ellipse images in an animation 
after 

8 they are generated. 

9 $ Inputs 

10 a (1,2) Range of a sizes of ellipse 

11 b (1,2) Range of b sizes of ellipse 

12 phi (1,1) Max rotation angle of ellipse 

13 t (1,1) Max line thickness in the plot of the circle 


(1,1) Number of ellipses 
m1. (1,1) Number of circles 
nP (1,1) Number of pixels, image is nP by nP 


* 
9e oo op odo oo do oo op oo op op op op 
Jg 


18 $ Outputs 
19 d {:,2} Ellipse data and image frames 
20 v (HPEDESBoodeangtonaci5cieswe m occdo ouo (MENS) 


The first section of the code generates random ellipses and circles. They are all centered in 
the image. 
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GenerateEllipses.m 


1 nE = EANG 

S el = cell (nE,2); 

you = 0.5«(mean(a) + mean(b)) »rand(1,nC)-«a(1); 
4 a = (a(2)-a(1))*rand(1,n) + a(1); 
5s b = (b(2)-b(1))x*rand(1,n) + b(1); 
6 phi = phixrand(1,n) ; 

7 cP = cos (phi); 

8 sP = sin(phi) ; 

9 theta = linspace(0,2x*pi); 

NEN e - cos(theta); 

p. 8 = gin(theta); 

2 m = length(c); 

a HE = 0.5+(t-0.5)*rand(1,nE); 

4 aMax = max([a(:);b(:);r(:)]); 

5 

6 % Generate circles 

y os = Sie 

8 d{k,1} = r(k)«I[c;s]; 

9 end 


D] 
eo 


N 


$ Generate ellipses 
for (k= EN 

d{k+nc,1} = [cP(k) sP(k);-sP(k) cP(k)]x*[a(k)x*c;b(k)x*s]; 
end 


yb Ny wy NN 
An R ODD 


sUTLCUeSURtEthewobgect5dseamcumncide 
= zeros(1,nE); 
(sie) La 


N 
3 


V 
V 


Y 
o 


The next section produces a 3D plot showing all the ellipses and circles. This is just to show 
you what you have produced. The code puts all the ellipses between z + 1. You might want to 
adjust this when generating larger numbers of ellipses. 


1 % 3D Plot 

2 NewFigure('Ellipses'); 
By e E — I 

4 dZ = 2xabs(z)/nE; 

5 O = ones(1,m); 

6 for k = 1:length(d) 

yi Zz e E + ad 

8 2A EZ O 

9 plot3 (d[k) (1,:) , d(k) (2, :) , za, ' linewidth',t(k)); 
10 hold on 

11 end 


12 grid on 
13 rotate3d on 


The next section converts the frames to nP by nP sized images in grayscale. We set the 
figure and the axis to be square, and set the axis to " equal”, so that the circles will have the 
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correct aspect ratio and in fact be circular in the images. Otherwise, they would also appear as 
ellipses, and our neural net would not be able to categorize them. This code block also draws the 
resulting resized image on the right-hand side of the window, with a title showing the current 
step. There is a brief pause between each step. In effect, it is an animation that serves to inform 


you of the script’s progress. 


% Create images - this might take a while for a lot of images 
f= figure (Name, Images visible, (onl colorc m i]s): 


subp Lo aii Ay AP ent; ADO ei fcr oxi on [ELS STA E 
ax2 = subplot(1,2,2,'Parent',f); grid on; 

k = 1:length(d) 

$ Plot the ellipse and get the image from the frame 


i 
e 
a 
H 


o6 o NAH AR OM He 
Fh 
ie) 
H 


plot (axl,d(k)(1,:),d(k] (2,:),'linewidth',t(k),'color','k'); 
axis(axl,'off'); axis(axl,'equal'); 
axis(axl1,aMax«[-1 1 -1 1]) 
10 frame = getframe(ax1); $ this call is what takes time 
11 imSmall = rgb2gray(imresize(frame2im(frame), [nP nP])); 
12 d{k,2} = imSmall; 
13 $ plot the resulting scaled image in the second axes 
14 imagesc (ax2,d[k,2)); 
15 axis (ax2,'equal') 
16 colormap (ax2,'gray'); 
17 title(ax2,[' Image ' num2str(k)]) 
18 set (ax2,’xtick’,1:nP) 
19 set (ax2,’ytick’,1:nP) 
20 colorbar (ax2) 
21 pause(0.2) 
2 end 


23 close(f) 


The conversion is rgb2gray (imresize(frame2im(frame), [nP nP])), which 
performs these steps: 


1. Get the frame with frame2im 
2. Resize to nP by nP using imresize 
3. Convert to grayscale using rgb2gray 


Note that the image data originally ranges from O (black) to 255 (white), but is averaged 
to ligher gray pixels during the resize operation. The colorbar in the progress window shows 
you the span of the output image. The image looks black as before since it is plotted with 
imagesc, which automatically scales the image to use the entire colormap—in this case, the 


gray colormap. 
The built-in demo generates ten ellipses and five circles. 


1 function Demo 


2 
3 a = OS e 
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Figure 3.4: Ellipses and a resulting image. 


Image 1 
— 250 
1 v 
Bem | 245 
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"d X 3 
y ed \ 1 
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1 
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4 b = [idl 2s 
5 phi = pi/4; 
G © = 37 

I A = Alp 
ene = 3 

o MP = 9327 


11 GenerateEllipses(a,b,phi,t,n,nC,nP) ; 


Figure 3.4 shows the generated ellipses and the first image displayed. 

The script CreateEllipses.m generates 100 ellipses and 100 circles and stores them 
in the Ellipses folder along with the type of each image. Note that we have to do a small 
trick with the filename. If we simply append the image number to the filename, 1, 2, 3, ... 200, 
the images will not be in this order in the datastore; in alphabetical order, the images would 
be sorted as 1, 10, 100, 101, and so on. In order to have the filenames in alphabetical order 
match the order we are storing with the type, we generate a number a factor of 10 higher than 
the number of images and add it to the image index before appending it to the file. Now we 
have image 1001, 1001, and so on. 


CreateEllipses.m 


$ Create ellipses to train and test the deep learning algorithm 


oo oo 


1 
2 The ellipse images are saved as jpegs in the folder Ellipses. 
3 

4 % Parameters 

5 nEllipses = 1000; 

6 nCircles = 1000; 

7 nBits = DE? 

8 maxAngle = pi/4; 

9 rangeA = oss 1] 

10 rangeB = (al eg 

Tac cic ON: 
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Figure 3.5: Ellipses and a resulting image. 100 circles and 100 ellipse images are stored. 
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12 tic 

13 [s, t] = GenerateEllipses(rangeA,rangeB,maxAngle,maxThick,nEllipses, 
nCircles,nBits); 

14 toc 


15 cd Ellipses 

16 kAdd = 10”ceil (log10 (nEllipses+nCircles)); % to make a serial number 
17 for k = 1:length(s) 

18 imwrite(s{k,2},sprintf(’Ellipse%d.jpg’ ,k+kAdd) ) ; 

19 end 


20 $ Save the labels 
2 save('Type','t'); 
my eol ga 


The graphical output is shown in Figure 3.5. It first displays the 100 circles and then the 
100 ellipses. It takes some time for the script to generate all the images. 

If you open the resulting jpegs, you will see that they are in fact 32x32 images with gray 
circles and ellipses. 


This recipe provides the data that will be used for the deep learning examples in the follow- 
ing sections. You must run CreateEllipses .m before you can run the neural net examples. 


3.4 Training and Testing 
3.4.1 Problem 


We want to train and test our deep learning algorithm on a wide range of ellipses and circles. 
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3.4.2 Solution 


The script that creates, trains, and tests the net is EllipsesNeuralNet.m. 


3.4.3 How It Works 


First we need to load in the generated images. The script in Recipe 3.3 generates 200 files. 
Half are circles and half ellipses. We will load them into an image datastore. We display a few 
images from the set to make sure we have the correct data and it is tagged correctly —that is, 
that the files have been correctly matched to their type, circle (1) or ellipse (0). 


EllipsesNeuralNet.m 


3% Get the images 


subplot (n,m, 1); 
imshow(imds.Files(ks(i)]); 


title(sprintf('Image $d: $d',ks(i),type.t(ks(i)))) 
end 


1 
2 cd Ellipses 

3 type = load('Type'); 

A Camere 

sE = categorical (type.t); 

6 imds = imageDatastore('Ellipses','labels',t); 

7 

8 labelCount = countEachLabel (imds) ; 

9 

o $ Display a few ellipses 

1 NewFigure('Ellipses') 

2 sal =e ibe 

a ig) = Bye 

4 ks = sort (randi (length (type.t),1,nx*m)); % random selection 
5 for i = 1:n«m 

6 

7 

8 

9 


S 
© 


o 


$ We need the size of the images for the input layer 
img = readimage(imds,1) ; 


N oN b 
N = 


Once we have the data, we need to create the training and testing sets. We have 100 files with 
each label (0 or 1, for an ellipse or circle). We create a training set of 80% of the files and 
reserve the remaining as a test set using splitEachLabel. Labels could be names, like 
““circle”” and “‘ellipse.’’ You are generally better off with descriptive ‘‘labels.’’ After all, a O 
or 1 could mean anything. The MATLAB software handles many types of labels. 


EllipsesNeuralNet.m 
1 $ Split the data into training and testing sets 
2 fracTrain = (0.95 
3 [imdsTrain, imdsTest] = splitEachLabel (imds, fracTrain, 'randomize'); 


The layers of the net are defined as in the previous recipe. The next step is training. The 
trainNetwork function takes the data, set of layers, and options, runs the specified training 
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algorithm, and returns the trained network. This network is then invoked with the classify 
function, as shown later in this recipe. This network is a series network. The network has other 
methods which you can read about in the MATLAB documentation. 


EllipsesNeuralNet.m 


$ Training 
The mini-batch size should be less than the data set size; the mini- 


1 


oo oo 


batch is 
3 $ used at each training iteration to evaluate gradients and update the 
weights. 


4 options - trainingOptions('sgdm', 
5 'InitialLearnRate',0.01, 

6 'MiniBatchSize',16, 

7 'MaxEpochs',5, 

8 'Shuffle','every-epoch', 

9 'ValidationData',imdsTest, 


10 'ValidationFrequency',2, 

11 ‘Verbose’ , false, 

12 ‘Plots’ ,’training-progress’ ) ; 

14 

15 net = trainNetwork(imdsTrain, layers, options); 


Figure 3.6 shows some of the ellipses used in the testing and training. They were obtained 
randomly from the set using randi. 

The training options need explanation. This is a subset of the parameter pairs available for 
trainingOptions. The first input to the function, * sgdm', specifies the training method. 
There are three to choose from: 


1. 'sgdm' Stochastic gradient descent with momentum 


Figure 3.6: A subset of the ellipses used in the training and testing. 
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2. ‘adam’ -— Adaptive moment estimation (ADAM) 


3. 'rmsprop' —Root mean square propagation (RMSProp) 


The ‘InitialLearnRate’ is the initial speed of learning. Higher learn rates mean 
faster learning, but the training may get stuck in a suboptimal point. The default rate for sgdm 
is 0.01. ' MaxEpochs' is the maximum number of epochs to be used in the training. In each 
epoch, the training sees the entire training set, in batches of MiniBatchSize. The number 
of iterations in each epoch is therefore determined by the amount of data in the set and the 
MiniBatchSize. We are using a smaller data set so we reduce the MiniBatchSize from 
the default of 128 to 16, which will give us 10 iterations per epoch. ‘Shuffle’ tells the 
training how often to shuffle the training data. If you don't shuffle, the data will always be 
used in the same order. Shuffling should improve the accuracy of the trained neural network. 
'ValidationFrequency' is how often, in number of iterations, ‘ValidationData’ 
is used to test the training. This validation data will be the data we reserved for testing 
when using splitEachLabel. The default frequency is every 30 iterations. We can use 
a validation frequency for our small problem of one, two, or five iterations. ‘Verbose’ 
means print out status information to the command window. ‘Plots’ only has the option 
'training-progress' (besides ‘none’ ). This is the plot you see in this chapter. 

“Padding” in the convolution2dLayer means that the output size is 
ceil (inputSize/stride), where inputSize is the height and width of the input. 

The training window runs in real time with the training process. The window is shown in 
Figure 3.7. Our network starts with a 5096 accuracy since we only have two classes, circles 
and ellipses. Our accuracy approaches 100% in just five epochs, indicating that our classes of 
images are readily distinguishable. The loss plot shows how well we are doing. The lower 
the loss, the better the neural net. The loss plot approaches zero as the accuracy approaches 
100%. In this case the validation data loss and the training data loss are about the same. This 
indicates good fitting of the neural net with the data. If the validation data loss is greater than 
the training data loss, the neural net is overfitting the data. Overfitting happens when you have 
an overly complex neural network. You can fit the training data, but it may not perform very 
well with new data, such as the validation data. For example, if you have a system which really 
is linear, and you fit it to a cubic equation, it might fit the data well but doesn't really model the 
real system. If the loss is greater than the validation data loss, your neural net is underfitting. 
Underfitting happens when your neural net is too simple. The goal is to make both zero. 

Finally, we test the net. Remember that this is a classification problem. An image is either 
an ellipse or a circle. We therefore use classify to implement the network. predLabels 
is the output of the net, that is, the predicted labels for the test data. This is compared to the 
truth labels from the datastore to compute an accuracy. 


EllipsesNeuralNet.m 

1 

2 3% Test the neural net 

3 predLabels = classify(net,imdsTest) ; 
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Figure 3.7: The training window with a learn rate of 0.01. The top plot is the accuracy expressed 


as a percentage. 


e... Training Progress (10-Sep-2019 13:15:41) 
Training Progress (10-Sep-2019 13:15:41) 


E 
5. 
¿ 
4 


4 testLabels = imdsTest.Labels; 
5 
6 accuracy = sum(predLabels == testLabels) /numel (testLabels) ; 


The output of the testing is shown in the following. The accuracy of this run was 97.50%. 
On some runs, the net reaches 100%. 


>> EllipsesNeuralNet 
ans = 
Figure (1: Ellipses) with properties: 
Number: 1 
Name: ‘Ellipses’ 
Color: [0.9400 0.9400 0.9400] 
Position: [560 528 560 420] 
Units: ‘pixels’ 


Show all properties 


Accuracy is 97.50% 
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Figure 3.8: The training window with a learn rate of 0.01 and a leaky reluLayer. 


e... Training Progress (10-Sep-2019 13:18:23) 
Training Progress (10-Sep-2019 13:18:23) 


Accuracy UN. 


Acc 


We can try different activation functions. EllipsesNeuralNetLeaky shows a leaky 
reluLayer. We replaced reluLayer with leakyReluLayer. The output is similar, but in 
this case, learning was achieved even faster than before. See Figure 3.8 for a training run. 


EllipsesNeuralNetLeaky.m 


$ This gives the structure of the convolutional neural net 
layers - [ 
imageInputLayer(size(img)) 


convolution2dLayer(3,8,'Padding','same') 
batchNormalizationLayer 
leakyReluLayer 


maxPooling2dLayer(2,'Stride',2) 


convolution2dLayer(3,16,'Padding','same') 
batchNormalizationLayer 
leakyReluLayer 


maxPooling2dLayer(2,'Stride',2) 


convolution2dLayer(3,32,'Padding','same') 
batchNormalizationLayer 


1 
2) 
3 
4 
5 
6 
7 
8 
9 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 leakyReluLayer 
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21 fullyConnectedLayer (2) 
22 softmaxLayer 
23 classificationLayer 


24 1; 


The output with the leaky layer is shown as follows. 


>> EllipsesNeuralNetLeaky 
ans = 
Figure (1: Ellipses) with properties: 


Number: 1 
Name: 'Ellipses' 
Color: [0.9400 0.9400 0.9400] 
Position: [560 528 560 420] 
Units: 'pixels' 


Show all properties 
Accuracy is 84.25$ 


We can try fewer layers. EllipsesNeuralNetOneLayer has only one set of layers. 


EllipsesNeuralNetOneLayer.m 


$$ Define the layers for the net 
$ This gives the structure of the convolutional neural net 


1 
2 

3 layers - [ 

4 imageInputLayer(size(img)) 

5 

6 convolution2dLayer(3,8,'Padding','same') 
7 batchNormalizationLayer 

8 reluLayer 

9 

10 fullyConnectedLayer(2) 

11 softmaxLayer 

12 classificationLayer 


13 1-5 


15 analyzeNetwork(layers) 


The results shown in Figure 3.9 with only one set of layers is still pretty good. This shows 
that you need to try different options with your net architecture as well. With this size of a 


problem, multiple layers are not buying very much. 


>> EllipsesNeuralNetOneLayer 
ans - 
Figure (2: Ellipses) with properties: 
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Figure 3.9: The training window for a net with one set of layers. 
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Number: 2 
Name: 'Ellipses' 
Color: [0.9400 0.9400 0.9400] 
Position: [560 528 560 420] 
Units: 'pixels' 


Show all properties 
Accuracy is 87.25% 


The one-set network is short enough that the whole thing can be visualized inside the win- 
dow of analyzeNetwork, as in Figure 3.10. This function will check your layer architecture 
before you start training and alert you to any errors. The size of the activations and *'Learn- 
ables”” is displayed explicitly. 


62 


CHAPTER 3 M FINDING CIRCLES 


Figure 3.10: The analyze window for the one-set convolutional network. 


eee Deep Learning Network Analyzer 


layers 
Analysis date: 09-Sep-2019 14:55:52 
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* imageinput 


conv 


| batchnorm 


relu 


* softmax 


* classoutput 


imageinput 


idi Input 
Convolution 

Batch Normalization 
ReLU 
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32x32x1 


32x32x8 


32x32x8 


32*32x8 


1x1x2 


1x1x2 


7H 


04 00 


Weights 3x3x1x8 
Bias 1x1x8 


Offset 1x1x8 
Scale 1x1x8 


Weights 2*8192 
Bias 2x1 


That concludes this chapter. We both generated our own image data and trained a neural 


net to classify features in our images! In this example, we were able to achieve 100% accuracy, 


but not after some debugging was required with creating and naming the images. It is critical 
to carefully examine your training and test data to ensure it contains the features you wish to 
identify. You should be prepared to experiment with your layers and training parameters as you 


develop nets for different problems. 
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Classifying Movies 


4.1 Introduction 


Netflix, Hulu, and Amazon Prime all attempt to help you pick movies. In this chapter, we will 
create a database of movies, with fictional ratings. We will then create a set of viewers. We 
will then try to predict if a viewer would choose to watch a particular movie. We will use Deep 
Learning with MATLAB’s pattern recognition network, patternnet. You will see that we 
can achieve accuracies of up to 100% over our small set of movies. Guessing what a customer 
would like to buy is something that all manufacturers and retailers want to do as it lets them 
focus their efforts on products that are of the greatest interest to their customers. As we show 
in this chapter, deep learning can be a valuable tool. 


4.2 Generating a Movie Database 
4.2.1 Problem 


We first need to generate a database of movies. 


4.2.2 Solution 


Write a MATLAB function, CreateMovieDatabase.m, to create a database of movies. 
The movies will have fields for genre, reviewer ratings (like IMDb), and the viewer ratings. 


4.2.3 How It Works 


We first need to come up with a method for characterizing movies. Table 4.1 gives our system. 
MPAA stands for Motion Picture Association of America. It is an organization that rates the 
movies. Other systems are possible, but this will be sufficient to test out our deep learning 
system. Three are strings and two are numbers. One number, length, is a continuum, while 
rating has discrete values. The second number, quality, is based on the ‘‘stars” in the rating. 
Some movie databases, like IMDb, have fractional values because they average over all their 
users. We created our own MPAA ratings and genres based on our opinions. The real MPAA 
ratings may be different. 
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Table 4.1: The movie database will have five characteristics. 


Value 


Type of movie (Animated, Comedy, Dance, Drama, Fantasy, Romance, SciFi, 
War) in a string 


Length can be any duration. We'll use randn to generate the lengths around a mean of 1.8 
hours and a standard deviation of 0.15 hours. Length is a floating point number. Stars are one 
to five and must be integers. 

We created an Excel file with the names of 100 real movies, which is included with the 
book’s software. We assigned genres and MPAA ratings (PG, R, etc.) to them. These were 
made up by the authors. The length and rating were left blank. We then saved the Excel file as 
tab-delimited text and search for tabs in each line. (There are other ways to import data from 
Excel and text files in MATLAB, this is just one example.) We then assign the data to the fields. 
The function will check to see if the maximum length or rating is zero, which it is for all the 
movies in this case, and then create random values. You can create a spreadsheet with rating 
values as an extension of this recipe! We use str2double since it is faster than str2num 
when you know that the value is a single number. £get1 reads in one line and ignores the end 
of line characters. 

You'll notice that we check for NaN in the length and rating fields since 


>> str2double('') 


CreateMovieDatabase.m 


function d = CreateMovieDatabase( file ) 


1 
2 

3 if( nargin < 1 ) 

4 Demo 

5 return 

6 end 

7 

8 f = fopen(file,'r'); 
9 

10 d.name EE 


led. rating - 
2 d.length = []; 
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13 d.genre = {}; 
14 d.mPAA = T5 
ie E = sprintf(’\t’); % a tab character 
16 k = 0 


while (~feof (f)) 


19 k =k +1; 

20 q - fgetl(f); $ one line of the file 

21 3 = strfind(q,t); % find the tabs in the line 
22 d.name{k} = eps al) subs % the name is the first token 
23 d.rating(1,k) = str2double (q (j (1)+1:3(2)-1)); 

24 d.genre{k} = q(j(2)«1:3(3)-1); 

25 d.length(1,k) = str2double (q (3 (3)+1:3 (4) -1)); 

26 d.mPAA{k} = q(j(4)+1:end); 

27 end % end of the file 

28 

29 if( max(d.rating) == 0 || isnan(d.rating(1)) ) 

30 drating =: ranas (Si, aby 183) 6 

31 end 

32 

3 if( max(d.length) == 0 || isnan(d.length(1))) 

34 d.length = 1.8 + 0.15*randn(1,k) ; 

35 end 

36 

37 


3 fclose(f); 


Running the function demo, shown in the following, creates a database of movies in a data 
structure. 


function Demo 


file = 'Movies.txt'; 


d 


CreateMovieDatabase( file ) 


The output is the following. 


>> CreateMovieDatabase 


d = 


struct with fields: 


name: {1 x 100 cell} 
rawang: alex 90]0doulie) 
length: [1 x 100 double] 
genre: {1 x 100 cell} 

mPAA: {1 x 100 cell) 
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Here are the first few movies: 


>> d.name’ 
ans = 
100x1 cell array 

(2001: A Space Odyssey’ 
(^A Star is Born’ 
('Alien' 
{’Aliens’ 

{’Amadeus’ 

('Apocalypse Now’ 

{’Apollo 13’ 

{’Back to the Future’ 


—— — — — — — — — 


4.3 Generating a Movie Watcher Database 
4.3.1 Problem 


We next need to generate a database of movie watchers for training and testing. 


4.3.2 Solution 


Write a MATLAB function, CreateMovieViewers .m, to create a series of watchers. We 
will use a probability model to select which of the movies each viewer has watched based on 
the movie's genre, length, and ratings. 


4.3.3 How It Works 


Each watcher will have seen a fraction of the 100 movies in our database. This will be a random 
integer between 20 and 60. Each movie watcher will have a probability for each characteristic: 
the probability that they would watch a movie rated 1 or 5 stars, the probability that they would 
watch a movie in a given genre, and so on. (Some viewers enjoy watching so-called *“turkeys””!) 
We will combine the probabilities to determine the movies the viewer has watched. For mPAA, 
genre,and rating, the probabilities will be discrete. For the length, it will be a continuous 
distribution. You could argue that a watcher would always want the highest rated movie, but 
remember this rating is based on an aggregate of other people's opinions so may not directly 
map onto the particular viewer. The only output of this function is a list of movie numbers for 
each user. The list is in a cell array. 

We start by creating cell arrays of the categories. We then loop through the viewers and 
compute probabilities for each movie characteristic. We then loop through the movies and 
compute the combined probabilities. This results in a list of movies watched by each viewer. 


CreateMovieViewer.m 
function [mvr,pWatched] = CreateMovieViewers( nViewers, d ) 


1 
D, 
3 if( nargin « 1 ) 
4 Demo 
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return 
end 
mvr = cell(1,nViewers) ; 
nMov = length(d.name) ; 
genre = { ‘Animated’, ‘Comedy’, ‘Dance’, ‘Drama’, ‘Fantasy’, ‘Romance’ 
1 
FOC Rane A ORO AMUS TCA ‘Crime’ ); 
MPAA = ('PG-13','R','PG'); 


o 


$ Loop through viewers. The inner loop is movies. 
for j = 1:nViewers 

$ Probability of watching each MPAA 

rMPAA = rand(1,length(mPAA) ) ; 

rMPAA = rMPAA/sum(rMPAA) ; 


$ Probability of watching each Rating (1 to 5 stars) 
rand(1,5); 
r/sum(r); 


E 
16 


$ Probability of watching a given Length 
mu = 1.5 + 0.5xrand; $ preferred movie length, between 1.5 and 2 hrs 
Sigma = 0.5xrand; $ variance, up to 1/2 hour 

$ Probability of watching by Genre 

rGenre = rand(1,length(genre) ) ; 

rGenre = rGenre/sum(rGenre) ; 


$ Compute the likelihood the viewer watched each movie 
pWatched = zeros(1,nMov); 
for k - 1:nMov 


pRating train gia) $ probability for this rating 
al = stremp(d.mPAA{k},mPAA) ; 3 logical array with one match 
pMPAA = rMPAA (i); $ probability for this MPAA 
al = strcmp(d.genre[k) genre); % logical array 
pGenre = rGenre (i); $ probability for this genre 
pLength - Gaussian(d.length(k),sigma,mu);  $ probability for this 
length 

pWatched(k) = 1 - (1-pRating) « (1-pMPAA) « (1-pGenre) « (1-pLength) ; 

end 


$ Sort the movies and pick the most likely to have been watched 


nInterval = floor( [0.2 0.6]x*nMov ); 
nMovies = randi (nInterval); 
[A] = sort (pWatched) ; 
mvr{j} = i(1:nMovies) ; 
end 
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This code computes the Gaussian or normal probability. The inputs include a standard 
deviation sigma and a mean mu. 


% CreateMovieViewers>Gaussian 
The probability is 1 when x==mu and declines for shorter or longer 
movies 
3 function p = Gaussian(x,sigma,mu) 


5 p = exp(- (x-mu) ^2/ (2«sigma^2)); 


The built-in function demo follows the Gaussian function. It is run automatically if the 
function is called with no inputs. 


$$ CreateMovieViewers»Demo 


1 
2 function Demo 

3 

4 s = load('Movies.mat'); 


The output from the demo is shown next. This shows how many of the movies in the 
database each viewer has watched; the most is 57 movies and the fewest is 26. 


»» CreateMovieViewers 
mvr - 


1x4 cell array 


oO € BF wWwN Ke 


{1x33 double} {1x57 double} {1x51 double} {1x26 double} 


4.4 Training and Testing 
4.4.1 Problem 


We want to test a deep learning algorithm to select new movies for the viewer, based on what 
the algorithm thinks a viewer would choose to watch. 


4.4.2 Solution 


Create a viewer database and train a pattern recognition neural net on the viewer’s movie se- 
lections. This is done in the script MovieNN. We will train a neural net for each viewer in the 


database. 
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4.4.3 How It Works 
First, the movie data is loaded and displayed. 


MovieNN.m 
1 %% Data 
2 genre = { ‘Animated’, ‘Comedy’, ‘Dance’, ‘Drama’, ‘Fantasy’, 
3 Momence! ES a A ORO UP NEMUS cU ‘Crime’ ); 
4 mPAA a (Re PG 
5 rating = (xt A Mutuo Ue ex!) 
6 
7 $$ The movies 
8 s = load('Movies.mat'); 
9 NewFigure('Movie Data’ ) 


10 subplot(2,2,1) 

11 histogram(s.d.length) 

12 xlabel('Movie Length') 

13 ylabel(’# Movies') 

14 subplot(2,2,2) 

is histogram(s.d.rating) 

16 xlabel('Stars') 

17 ylabel(’# Movies') 

18 subplot(2,1,2) 

19 histogram(categorical(s.d.genre)) 
20 ylabel(’# Movies’ ) 

20 set(gca, 'xticklabelrotation',90) 


The viewer database is then created from the movie database: 


MovieNN.m 
1 $$ The movie viewers 
2 nViewers = 4; 
3 mvr = CreateMovieViewers( nViewers, s.d ); 


The next block displays the characteristics of the movies each viewer has watched. This is 
shown graphically in Figure 4.1. For the moment, there are only four viewers. 


$ Display the movie viewer's data 


1 
A IDX = linspace (min (s.d.length) ,max(s.d.length),5); 

3 

4 for k = 1:nViewers 

5 NewFigure (sprintf ('Viewer $d',k)); 

6 

7 subplot (2,2,1); 

8 g = zeros(1,11); 

9 for j = 1:length(mvr{k}) 

10 a = mvr(k) (3); 

11 dL = strmatch(s.d.genre(i],genre); $40k«MATCH2» 
12 reb ea) ss alo 

13 end 
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Figure 4.1: The watched movie data for four viewers. 
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26 “sg ¿8 E S 
< © < E 
Genre Genre 
15 40 15 50 
40 x1 
30 
10 10 Y 45 
30 
20 
5 5 20 
19 10 
o - o 0 - o 
A £P dod E i 9 * 4 f £d f 2 E g 
* H $ a H $ a 
Rating a Rating a 
mPAA mPAA 
14 bar(1:11,9); 
15 set(gca,'xticklabel',genre,'xticklabelrotation',90,'xtick',1:11) 
16 xlabel('Genre') 
17 title(sprintf('Viewer %d’,k) ) 
18 grid on 
19 


1:length (mvr(k]) 


20 subplot (2,2,2); 
21 g zeros (M5); 
22 es Gf i5 

23 for i = 

24 

25 g(3) 

26 end 

27 end 

28 end 


29 bar (1:5,9) 


, 


+ 1} 
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if( s.d.length(mvr{k}(i)) > 1X(3) ) 
= g(3) 
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30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 


set(gca,'xticklabel',floor(1X«60),'xtick',1:5) 
xlabel('Length Greater Than (min)') 


grid on 


subplot (2,2,3); 
g = zeros(1,5); 


: length (mvr{k}) 


mvr{k} (j); 


s.d.rating(i); 


g(1) 


forse 
SE = 
1 = 
gt) = 

end 

bar (1:5,9); 


+ 1; 


set (gca, 'xticklabel', rating, ’xticklabelrotation’ ,90,’xtick’,1:5) 
xlabel ('Rating/') 


grid on 


subplot (2,2,4); 
g = zeros(1,3); 


: length (mvr{k}) 


mvr{k} (j); 


strmatch(s.d.mPAA(i),mPAA); %#ok<MATCH2> 


g(1) 


for j = 1 
3l = 
1 = 
gt) = 

end 

bar (1:3,9); 


+ 1; 


set(gca,'xticklabel',mPAA,'xticklabelrotation',90,'xtick',1:3) 
xlabel('mPAA') 


grid on 
end 


than that number. 


methods train and view. 


Con I IS US MIRO SEE 


We use bar charts throughout. Notice how we make the x labels strings for the genre and 
so on. We also rotate them 90 degrees for clarity. The length is the number of movies longer 


This data is based on our viewer model from the previous recipe which is based on joint 
probabilities. We will train the neural net on a subset of the movies. This is a classification 
problem. We just want to know if a given movie would be picked or not picked by the viewer. 

We use patternnet to predict the movies watched. This is shown in the next code 
block. The input to pat ternnet is the sizes of the hidden layers, in this case a single layer 
of size 40. We convert everything into integers. Note that you need to round the results since 
patternnet does not return integers, despite the label being an integer. patternnet has 


3% Train and test the neural net for each viewer 
for k - 1:nViewers 
$ Create the training arrays 


MOX 


nMov z 
for j - 


= zeros(4,100); 
= zeros(1,100); 


p 
$ 
p 

o 


the input data 
the target - did the viewer watch the movie? 


length (mvr{k}) ; $ number of watched movies 
1:nMov 
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9 i = mvr(k) (j); % index of the jth movie watched by the kth 
viewer 

0 o) = Siatelaieeieatiare Gie 

1 x(2,j) = s.d.length(i); 

2 cr) = strmatch(s.d.mPAA(i),mPAA, "exact '); s#Ok< *MATCH3 > 

3 sce. a) = strmatch(s.d.genre{i},genre,’exact’) ; 

4 y(1,j) = 1; % movie watched 

5 end 

6 

7 ab = setdiff(1:100,mvr(k]); $ unwatched movies 

8 for j = 1:length(i) 

9 x(1,nMov+)) = s.d.rating(i(j)); 

20 x(2,nMov+j) = s.d.1length(i(j)); 

21 x(3,nMov+j) = strmatch(s.d.mPAA(i(j)),mPAA,'exact'); 

22 x(4,nMov+j) = strmatch(s.d.genre{i(j)},genre,’exact’) ; 

23 y(1,nMov+j) = 0; % movie not watched 

24 end 


N 
n 


26 $ Create the training and testing data 

27 j - randperm(100); 

28 j = j(1:70); % train using 70% of the available data 
29 saecu = SL) E 

30 abc WALT) 

31 J S Estuche (1L sdb)» 3) A 

32 xTest = se) 

33 yTest E NL g 

34 

35 net - patternnet(40); $ input a scalar or row of layer sizes 
36 net train(net,xTrain,yTrain); 


view (net); 
yPred = round (net (xTest)); 


G OS 
Doo 


So 


3% Test the neural net 
accuracy = sum(yPred == yTest) /length(yTest) ; 
fprintf('Accuracy for viewer %d (%d movies watched) is %8.2£%%\n’,... 
k,nMov, accuracy*100) 
end 


pp A A D 
RON FE O 


The training window is shown in Figure 4.3. When we view the net, MATLAB opens the 
display in Figure 4.2. Each net has four inputs, for the movie’s rating, length, genre, and MPAA 
classification. The net’s single output is the classification for whether the viewer has watched 
the movie or not. The training window provides access to additional plots of the training and 
performance data. 
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Figure 4.2: The patternnet network with four inputs and one output. 


000 Pattern Recognition Neural Network (view) 


Hidden Output 


The output of the script is shown later for patternnet (40). The accuracy is the per- 
centage of the movies in the test set (30% of the data available) that the net correctly predicted 
the viewer to have watched. The accuracy is usually between 65% and 90% for this size hidden 
layer. 


>> MovieNN 


Accuracy for viewer 1 (58 movies watched) is 83.33% 
Accuracy for viewer 2 (51 movies watched) is 76.67% 
Accuracy for viewer 3 (23 movies watched) is 83.33% 
Accuracy for viewer 4 (54 movies watched) is 100.00% 


patternnet (40) returns good results, but we also tried greater and smaller numbers of 
layers and multiple hidden layers. For example, with a layer size of just 5, the accuracy ranges 
from 50 to 70%. With a size of 50, we reached over 90% for all viewers! Granted, this is a 
small number of movies. The results will vary with each run due to the random nature of the 
variables in the test. The predictions are probably as good as Netflix! It is important to note 
that the neural network did not know anything about the viewer model. Nonetheless, it does 
a good job of predicting movies that the viewer might like. This is one of the advantages of 
neural nets. 
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Figure 4.3: Patternnet training window. 


eo Neural Network Training (nntraintool) 


Neural Network 


Output 
Input Output 


Algorithms 


Data Division: Random (dividerand) 

Training: Scaled Conjugate Gradient (trainscg) 
Performance: Cross-Entropy (crossentropy) 
Calculations: MEX 


Progress 


Epoch: o 1000 


Time: 0:00:00 
Performance: 0.969 | 0.186 0.00 
Gradient: 0.568 NO. 0848 1.00e-06 
Validation Checks: 0 HAS 6 
Plots 
(lotperform 
Training State (plottrainstate) 
Error Histogram (ploterrhist) 
Confusion (plotconfusion) 
Receiver Operating Characteristic (plotroc) 
Plot Interval: 1 epochs 


pé OO LLO DS LO O rape pun] 


A Validation stop. 


o Stop Training o Cancel 
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Ee 


Algorithmic Deep Learning 


In this chapter, we introduce the Algorithmic Deep Learning Neural Network (ADLNN), a 
deep learning system that incorporates algorithmic descriptions of the processes as part of the 
deep learning neural network. The dynamical models provide domain knowledge. These are in 
the form of differential equations. The outputs of the network are both indications of failures 
and updates to the parameters of the models. Training can be done using simulations, prior to 
operations, or through operator interaction during operations. 

The system is shown in Figure 5.1. This is based on work from the books by Paluszek 
and Thomas [30, 29]. These books show the relationships between machine learning, Adaptive 
Control, and Estimation. This model can be encapsulated in a set of differential equations. We 
will limit ourselves to sensor failures in this example. 

The output indicates what kind of failures have occurred. It indicates that either one or both 
of the sensors have failed. 

Figure 5.2 shows an air turbine.! This air turbine has a constant pressure air supply. The 
pressurized air causes the turbine to spin. It is a way to produce rotary motion for a drill or 
other purpose. 

We can control the valve from the air supply, the pressure regulator, to control the speed 
of the turbine. The air flows past the turbine blades causing it to turn. The control needs to 
adjust the air pressure to handle variations in the load. The load is the resistance to turning. 
For example, a drill might hit a harder material while in use. We measure the air pressure p 
downstream from the valve, and we also measure the rotational speed of the turbine w with a 
tachometer. 

The dynamical model for the air turbine is 


Kp 
Tp | u (5.1) 


!PhD thesis of Jere Schenck Meserole, “Detection Filters for Fault-Tolerant Control of Turbofan Engines,” 
Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 1981. 


© Michael Paluszek and Stephanie Thomas 2020 TI 
M. Paluszek and S. Thomas, Practical MATLAB Deep Learning, 
https://doi.org/10.1007/978-1-4842-5124-9_5 
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Figure 5.1: Algorithmic Deep Learning Neural Network (ADLNN). The network uses numeri- 
cal models as a filtering layer. The numerical models are a set of differential equations config- 
ured as a detection filter. 


Algorithmic Layer Classification 
Fully 
Connected 
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Database 


Algorithmic Deep Learning Neural Network 


This is a state space system 


t = az + bu (5.2) 
where 
—l 0 
a= | a X | (5.3) 
"a T 
The state vector is 
| E | (5.5) 


The pressure downstream from the regulator is equal to Kpu when the system is in equilibrium. 
Tp is the regulator time constant, and 7; is the turbine time constant. The turbine speed is 
Kp when the system is in equilibrium. The tachometer measures w and the pressure sensor 
measures p. The load is folded into the time constant for the turbine. 

The code for the right-hand side of the dynamical equations is shown in the following. Only 
one line of code is the right-hand side. The rest returns the default data structure. The simplicity 
of the model is due to its being a state space model. The number of states could be large, yet 
the code would not change. As you can see, the dynamical equations are just one line of code. 
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Figure 5.2: Air turbine. The arrows show the airflow. The air flows through the turbine blade 
tips causing it to turn. 


Pressure Regulator Turbine 


Constant 


Pressure Sensor 


Tachometer 

RHSAirTurbine.m 

1 if( nargin < 1 ) 

2 kP Ed 

3 kT = Eg 

4 [amp =F ho: 

5 tauT = 40; 

6 c - eye(2); 

7 b - [kP/tauP;0]; 

8 a - [-1/tauP 0; kT/tauT -1/tauT]; 

9 

10 Doe =) Srta (YE Ey Vic” 11e, (el? pie, 7 wl O) 
11 if( nargout == 0) 

12 disp('RHSAirTurbine struct:'); 

13 end 

14 return 

15 end 


o 


17 $ Derivative 
18 xDot = d.a«x + d.bx*d.u; 


The simulation, AirTurbineSim.m,is shown in the following. The control is a constant, 
also known as a step input. TimeLabel converts the time vector into units (minutes, hours, 
etc.) that are easier to read. It also returns a label for the time units that you can use in the plots. 
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AirTurbineSim.m 


3% Initialization 
tEnd = 1000; % sec 


op 


State space system 
= RHSAirTurbine; 


Q 


$ This is the regulator input. 
(libi = di) p 


0.02; $ sec 
n - ceil(tEnd/dT); 


$ Initial state 
- [0;0]; 


ES 


3% Run the simulation 


% Plotting array 


© o a DAR Q MJ — CHO OMIA DWH BR 0 M - 
H 
ll 


xP - zeros(2,n); 
on E = (0:n-1)xdT; 
21 
2 for k = ln 
23 E Es en 
24 ES = RungeKutta( @RHSAirTurbine, t(k), x, dT, d ); 
25 end 
26 
27 $$ Plot the states and residuals 
28 [t,tL] = TimeLabel (t) ; 
2 yL = ('p (N/m 2) 'Nomega (rad/s)' ); 
20) ET "Air Turbine Simulation’ ; 


w 


PICeSeia ep dD, e aioe ety, 2 calcita, ricas Teale dii y ican) 


The response to a step input for u is shown in Figure 5.3. The pressure settles faster than 
the turbine angular velocity. This is due to the turbine time constant and the lag in the pressure 
change. 

Now that we understand better how an air turbine works, we can build the filter to detect its 
sensor failures. It is always a good idea to understand your dynamical system. When building 
an algorithmic filter or estimator, this is a necessity. For a neural net, it is not necessary just to 
get something working, but really helps when interpreting the neural net performance. 
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Figure 5.3: Air turbine response to a step pressure regulator input. 
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5.1 Building a Detection Filter 
5.1.1 Problem 


We want to build a system to detect failures in our air turbine using the linear model developed 
in the previous recipe. 


5.1.2 Solution 


We will build a detection filter that detects pressure regulator failures and tachometer failures. 
Our plant model (continuous a, b, and c state space matrices) will be an input to the filter 
building function. 
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5.1.3 How It Works 


The detection filter is an estimator with a specific gain matrix that multiplies the residuals. 


D - I$ 
[l= £ a]l] 
Tt Tt 
where p is the estimated pressure and w is the estimated angular rate of the turbine. The D 


matrix is the matrix of detection filter gains. This matrix multiplies the residuals, the difference 
between the measured and estimated states, into the detection filter. The residual vector 1s 


pad (5.7) 


The D matrix needs to be selected so that this vector tells us the nature of the failure. The gains 
should be selected so that 


Ky 
Tp 
W — UJ 


ut] EE 


dz, d22 i | $9 


]. The filter is stable. 


2. If the pressure regulator fails, the first residual p — is nonzero, but the second remains 
Zero. 


3. If the turbine fails, the second residual w — w is nonzero, but the first remains zero. 


The gain matrix is 


1 
p=a+| 7 i| (5.8) 


T2 


We can see this by substituting this D into Equation 5.6. 


e dii 012 p 
HUP PUBL 
The time constant 7, is the pressure residual time constant. The time constant 72 is the 
tachometer residual time constant. In effect, we cancel out the dynamics of the plant and replace 
them with decoupled detection filter dynamics. These time constants should be shorter than the 
time constants in the dynamical model so that we detect failures quickly. However, they need 
to be at least twice as long as the sampling period to prevent numerical instabilities. 
We will write a function, DetectionFilter.m, with three actions: an initialize case, 
an update case, and a reset case. varargin is used to allow the three cases to have different 
input lists. The function signature is 


Kp 


Tp 


utd]? | (5.9) 


1 function d = DetectionFilter( action, varargin ) 


82 


CHAPTER 5 M ALGORITHMIC DEEP LEARNING 


It can be called in three ways: 


>> el =) IDlaieevelealejyalsabikiciene( mated e cest au cT 
2osde-apetectionmkihtes0s updated uy. Vici) 
>> d = DetectionFilter( 'reset', d ) 


The first initializes the function, the second is called in each time step for an update, and 
the last resets the filter. All data is stored in the data structure d. 

The function simulates detecting failures of an air turbine. An air turbine has a constant 
pressure air source that sends air through a duct that drives the turbine blades. The turbine is 
attached to a load. The air turbine model is linear. Failures are modeled by multiplying the 
regulator input and tachometer output by a constant. A constant of 0 is a total failure and 1 is 
perfect operation. 

The filter is built and initialized in the following code in DetectionFilter. The con- 
tinuous state space model of the plant, in this case our linear air turbine model, is an input. The 
selected time constants 7 are also an input, and they are added to the plant model as in Equation 
5.8. The function discretizes the plant a and b matrices and the computed detection filter gain 
matrix d. 


DetectionFilter.m 


case 'initialize' 


1 
2 d = varargin{1}; 

3 tau = varargin{2}; 

4 dT = varargin(3); 

5 

6 $ Design the detection filter 

7 d.d = d.a + diag(1./tau); 

8 

9 % Discretize both 

10 d'a “F CTODZOH arda r ano dtes 
11 lets. Cdl = CDA ele ello e dq 
12 

13 % Initialize the state 

14 m = sized I); 

15 d.x = zeros(m,1); 

16 d.r = zeros(m,1); 


The update for the detection filter is in the same function. Note the equations implemented 
as described in the header. 


case ‘update’ 

u = varargin(1) 
= varargin{2} 
= varargin(3); 
ye = Glas. 
=d ad K + d pu + d dk; 
= Y; 


F 


, 


ZO 4 fF WN — 
QQrR ax 
RO» 

mon 
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Finally, we create a reset action to allow us to reset the residual and state values for the 
filter in between simulations. 


case ‘reset’ 
d = varargin{1}; 
= size(d.a,1); 
zeros(m,1); 
.Y - zeros(m,1); 


Uk WN 
aag 
RO» 

I 


5.2 Simulating Fault Detection 
5.2.1 Problem 


We want to simulate a failure in the plant and demonstrate the performance of the failure detec- 
tion. 


5.2.2 Solution 


We will build a MATLAB script that designs the detection filter using the function from the 
previous recipe and then simulates it with a user selectable pressure regulator or tachometer 
failure. The failure can be total or partial. 


5.2.3 How It Works 


DetectionFilterSimdesigns a detection filter using DetectionFilter from the pre- 
vious recipe and implements it in a loop. A Runge Kutta numerical integration algorithm prop- 
agates the continuous domain in the right-hand side of the air turbine, RHSAirTurbine. The 
detection filter is discrete time. 

The script has two scale factors uF and tachF that multiply the regulator input and the 
tachometer output to simulate failures. Setting a scale factor to zero is a total failure, and 
setting it to one indicates that the device is working perfectly. If we fail one, we expect the 
associated residual to be nonzero and the other to stay at zero. 


DetectionFilterSim.m 


1 $ Script to simulate a detection filter 

2 Simulates detecting failures of an air turbine. An air turbine has a 
constant 

pressure air source that sends air through a duct that drives the 
turbine 


blades. The turbine is attached to a load. 


oo oe 


w 
oe 


un 
oe op ap 


The air turbine model is linear. Failures are modeled by multiplying 
the 

regulator input and tachometer output by a constant. A constant of 0 
is a 

total failure and 1 is perfect operation. 

$ See also: 


m 
oe 


oo oo 
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% Time constants for failure detection 


taul = 0.3; % sec 
tau2 033 Sec 


$ State space system 
d = RHSAirTurbine; 


$$ Initialization 
ch! = (Que a Ses 
n = ceil(tEnd/dT); 


$ Initial state 
x = [0;0]; 


$$ Detection Filter design 
dF = DetectionFilter('initialize',d, [tau1;tau2],dT); 


$$ Run the simulation 


$ Control. This is the regulator input. 
Ns 00 


$ Plotting array 
xP = zeros(4,n); 
e = (Osasd) cuts 


for k= En 
% Measurement vector including measurement failure 
2 


y = [x(1);tachF*x(2)]; % Sensor failure 
Same o) ex [bseptelis lp 


% Update the detection filter 


dF - DetectionFilter('update',u,y,dF); 

integra te one step 

(olg ibi = uFxu; $ Actuator failure 

x = RungeKutta( @RHSAirTurbine, t(k), x, dT, d ); 


3% Plot the states and residuals 

[t, tL] = TimeLabel(t); 

yL = ('p' ‘\omega’ ‘Residual P’ 'Residual \omega’ ); 

tTL = ‘Detection Filter Simulation’; 

DIO ESSIE ep HP, < ae Aydin, U stave (bias (ealiciley? piedi) 


In Figure 5.4 the regulator fails and its residual is nonzero. In Figure 5.5 the tachometer 


fails and its residual is nonzero. The residuals show what has failed clearly. Simple boolean 
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Figure 5.4: Air turbine response to a failed regulator. 
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logic (i.e., if end statements) are all that is needed. Now, this indicates we don’t need machine 
learning for this problem. The goal here is to show that machine learning can recognize the 
faults. This will demonstrate that it has the potential to be coupled with more complex systems. 

A detection filter is a type of filter. It filters out nonfailures, much like a low-pass filter 
filters out noise. Adding any type of filter stage to a deep learning system can enhance its 
performance. Of course, as with any filter, one needs to be careful not to filter out information 
needed by the learning system. For example, suppose you had an oscillator that was your 
dynamical. If a low-pass filter cutoff were below the oscillation frequency, you would not be 
able to learn anything about the oscillation. 
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Figure 5.5: Air turbine response to a failed tachometer. The residuals immediately reach the 
outputs indicating the failure because the filter is fast. 
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5.3 Testing and Training 
5.3.1 Problem 


We want a neural network to characterize faults. Both tachometer and regulator failures will be 
characterized. 


5.3.2 Solution 


We use the same approach as we did for the XOR problem in Chapter 2. The outputs from the 
detection filter are classified. This could be done by simple boolean logic. The point is to show 
that a neural net can solve the same problem. 
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5.3.3 How It Works 


We run the simulation to get all possible residuals and combine them in a residual 2 by 4 
array. Our outputs are strings for the four cases. You can use strings instead of numbers as the 
classifier labels, which is better than using integers and then converting them to strings. This 
just makes the code cleaner and easier to understand. Someone else working on your code is 
less likely to misinterpret the outputs. We use feedforwardnet to implement the neural 
net. It has two layers, two inputs, and one output. The output is the status of the system. We 
first train the system with 600 randomly selected test cases. We then simulate the network. 

The net is a feedforwardnet with two layers. There is one output, the failure, and 
two inputs from the detection filter. We measure the residuals that we expect to see for each 
possible failure case. There are four, **none," ‘‘both,” *'tach," and *'regulator." The training 
pairs are a random selection from the four possible sets using randi. 


DetectionFilterNN.m 


Train the neural net 


ES; 

2 $ Cases 

3 $ 2 layers 

4 % 2 inputs 

SX Sm Out put 

6 

7 net - feedforwardnet (2); 

8 

9 % [none both tach regulator] 
0 residual - [0 018693851: 10 "DS LS 59385 13-044 
1 0 -0.00008143 -0.09353033 -0.00008143]; 
2 

3 % labels is a strings array 

4 label = ["none" "both" "tach" "regulator"]; 

5 

6 $ How many sets of inputs 

WL) dal = SO 

8 

9 $ This determines the number of inputs and outputs 
20 X - zeros(2,n); 

2 V - zeros(1,n); 

22 

23 % Create training pairs 

DARL OTe Ke S T 

25 5 = ran dia o 

26 XK) = residuals 0; 

27 y (k) = label(j); 

28 end 

29 

30 net = configure (net, x, y); 

31 net.name = 'DetectionFilter'; 
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Se nets = train(net, y), 

33 Cc - gim(net,residual); 

34 

35 fprintf£(’\nRegulator Tachometer Failed\n’); 

36 fork = 1:4 

37 fprintf('$9.2e $9.2e s\n’ ,residual (1,k) ,residual (2,k),label(k)); 
38 end 


39 
40 $ This only works for feedforwardnet (2); 

41 fprintf('AnHidden layer biases %6.3f %6.3f\n’,net.b{1}); 

4 fprintf('Output layer bias %6.3f\n’,net.b{2}); 

43 fprintf('Input layer weights  $6.2f %6.2f\n’,net.IW{1}(1,:)); 
4 fprint£(' $6.2£ %6.2f\n’,net.IW{1}(2,:)); 
45 fprintf('Output layer weights %6.2f %6.2f\n’,net.LW{2,1}(1,:)); 


The training GUI is shown in Figure 5.6. 

The GUI buttons are described in detail in Chapter 2 in the XOR problem. 

The results are shown in the following. The neural net works quite well. The printout in 
the command window shows it uses two types of activation functions. The output layers use a 
linear activation function. 


>> DetectionFilterNN 


Regulator Tachometer Failed 
0.00e+00 0.00e+00 none 

Tear eco gexadec5 both 
0.00e+00 -9.35e-02 tach 

= RSM s 14505 regulator 


Hidden layer biases -1.980 1.980 
Output layer bias 0.159 
Input layer weights DES sal, 3} 

OS salah 
Output layer weights -0.65 OR Al; 
Hidden layer activation function tansig 
Output layer activation function purelin 
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Figure 5.6: The training GUL 


e? Neural Network Training (nntraintool) 


Neural Network 


Hidden 


Input 

2 
Algorithms 
Data Division: Random (dividerand) 
Training: Levenberg-Marquardt (trainlm) 


Performance: Mean Squared Error (mse) 
Calculations: MEX 


Progress 
Epoch: 0 1000 
Time: 0:00:00 
Performance: 0.00 0.00 
Gradient: 0.00 1.00e-07 
Mu: 0.00100 0.00100 1.00e+10 
Validation Checks: o[ 10  ?J $]e 
Plots 
(plotperform 
Training State (plottrainstate) 
Error Histogram (ploterrhist) 
Regression (plotregression) 
Plot Interval: Maicao 1 epochs 


«f? Performance goal met. 


[»] Stop Training o Cancel 
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CHAPTER 6 
EH 


Tokamak Disruption Detection 


6.1 Introduction 


Tokamaks are fusion machines that are under development to produce baseload power. Baseload 
power is power that is produced 24/7 and provides the base for powering the electric grid. The 
International Tokamak Experimental Reactor (ITER) is an international project that will pro- 
duce net power from a Tokamak. Net power means the Tokamak produces more energy than 
it consumes. Consumption includes heating the plasma, controlling it, and powering all the 
auxiliary systems needed to maintain the plasma. It will allow researchers to study the physics 
of the Tokamak which will hopefully lead the way toward operational machines. A Tokamak 
is shown in Figure 6.1. The inner poloidal field coils act like a transformer to initiate a plasma 
current. The outer poloidal and toroidal coils maintain the plasma. The plasma current itself 
produces its own magnetic field and induces currents in the other coils. 

The image in Figure 6.1 was produced by the function DrawTokamak which calls DCoil 
and SquareHoop. We aren't going to discuss those three functions here. You should feel free 
to look through the functions as they show how easy it is to do 3D models using MATLAB. 

One problem with Tokamaks is disruptions. A disruption is a massive loss of plasma control 
that extinguishes the plasma and results in large thermal and structural loads on the Tokamak 
wall. This can lead to catastrophic wall damage. This would be bad in an experimental machine 
and unacceptable in a power plant as 1t could lead to months of repairs. 

The factors that can be used to predict a disruption [22] are 


1. The poloidal beta (beta is the ratio of plasma pressure to magnetic pressure). 
2. The line-integrated plasma density. 

3. The plasma elongation. 

4. The plasma volume divided by the device minor radius. 

5. The plasma current. 


6. The plasma internal inductance. 
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Figure 6.1: A Tokamak. There are three sets of coils. The inner poloidal field coil initializes the 
plasma, and the poloidal and toroidal coils maintain the plasma. Some of the toroidal coils are 
left out to make it easier to see the Tokamak center. 


Inner poloidal field coils 
(Primary transformer circuit) 


Vacuum vessel containing plasma 


b 


Outer poloidal field coils 
(for plasma positioning and shaping) 
Toroidal field coils 


7. The locked mode amplitude. 
8. The plasma vertical centroid position. 
9. 'The total input power. 


10. The safety factor reaching 95%. Safety factor is the ratio of the times a magnetic field 
line travels toroidally (the long way around the doughnut) vs. poloidally (short way). We 
want the safety factor to be greater than 1. 


11. The total radiated power. 
12. The time derivative of the stored diamagnetic energy. 


Locked modes are magnetohydrodynamic (MHD) instabilities that are locked in phase and 
in the laboratory frame. They can be precursors to disruptions. The plasma internal inductance 
is the inductance measured by integrating the inductance over the entire plasma. In a Tokamak, 
the poloidal direction is along the minor radius circumference. The toroidal direction is along 
the major radius circumference. In a plasma, the dipole moment due to the circulating current is 
in the opposite direction of the magnetic field which makes it diamagnetic. Diamagnetic energy 
is the energy stored in a magnetized plasma. Diamagnetic measurements measured this energy. 

Our system is shown in Figure 6.2. We will just be looking at the plasma vertical position 
and the coil currents. We are only going to look at the plasma vertical position in this example. 
We'll find it to be more than complex enough! We will start with the dynamics of the vertical 
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Figure 6.2: Tokamak and control system. ELM means ‘‘Edge Localized Mode" and is a distur- 
bance. The plasma is shown in a poloidal cut through the torus. 


PI 
i. 4 Actuators 2» asma 


Om Tokamak 
Sensors 


ELM 


Controller 


Deep Learning 


motion of a plasma. We'll then learn about plasma disturbances. After that, we will design a 
vertical position controller. Finally, we'll get to the deep learning. 


6.2 Numerical Model 
6.2.1 Dynamics 


For our example, we need a numerical model of disruptions [9], [8], [26]. Ideally our model 
would include all of the effects in the list given earlier. We use the model in Scibile [27]. We 
will only consider vertical movement. 
The equilibrium force on the plasma is induced by the magnetic field and current density in 
the plasma. 
Jx B=Vp (6.1) 


where J is current density, B the magnetic field, and p the pressure. Pressure is force per unit 
area on the plasma. The momentum balance is [2] 


du 


pu = Tx B-Vp (6.2) 


where v is the plasma velocity and p is the plasma density. The imbalance causes plasma 
motion, that is, when J x B ¢ Vp. If we neglect the plasma mass, we get 


LEI + Appzl, = Fo (6.3) 


Ly is the mutual change inductance matrix of the coils. J is the vector of currents in the Toka- 
mak coils and in the conducting shell around the plasma. Fp is the external force normalized to 
the plasma current J,,. If we lump currents into active currents, driven by an external voltage, 
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Figure 6.3: Lumped parameter model. 


I 


V i 
Control Amplifier | Active Loop 


a 


I, 
Passive Loop 
and passive currents, we get a simplified model of the plasma. We need to add Kirchhoff's 
voltage law to get a dynamical model. 


LI + RI Lpil,—-IV (6.4) 


Plasma 


I couples to voltages to the currents, L is the coil inductance matrix, and R is the coil resistance. 
If we combine these, we get the state space matrices shown in the following. The lumped model 
is shown in Figure 6.3. 

The dynamical equations are 


La i Ve 
i (SAP ld, |B") É, (6.5) 
Va Va F, 
La Ve 
z—05 I, |+D*| É, (6.6) 
Va F, 
: _ Baa kav pm Laa 
Raa Rov kay — My 
A = L ka kav Lao Luv I Mep s (6.7) 
1— qu 
0 0 Bo 
0 0 0 
1 
B"=| 0 peu Y (6.8) 
i 0 0 
Tt 
1 
C° = | -Lp Lp 0] (6.9) 
Alps Um 5 
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Table 6.1: Model parameters from the Joint European Torus (JET). 


FHyy | Passive coil resistance — — — — — — — — [256x107 [Q9 — 


Mutual change inductance between the active coils and | 115.2 x 107° | H/m 
IN RE ES 
Lyp Mutual change inductance between the passive coils and | 3.2 x 107 
LEN T C RERUM 


[Apr | Normalized destabilizing ore — — OO — Hin? 


N/A is Newton (unit of force) per Amp, in this case. Q is Ohm, the unit of resistance; H is 
Henry, the unit of inductance; A is amps 


S c ISE 
D> = | 0 0 x | (6.10) 
Lo 
kav = [mem (6.11) 
A" das 
Mop = We (6.12) 
vp 
A" Ls 
Map = TIP (6.13) 
ap 


This includes a first-order lag to replace the pure delay in Scibile. The parameters used in the 
simulation are given in Table 6.1. The preceding plasma dynamical equations may look very 
mysterious, but they are really just a variation on a circuit with an inductor and a resistor, shown 
in Figure 6.4. 

The major addition is that the currents produce forces that move the plasma in z. 

The equation for this circuit is 


T 
o +RI=V (6.14) 


which looks the same as our first-order lag. The first term is the voltage drop across the inductor 
and the second the voltage drop across the resistor. If we suddenly apply a constant V, we get 
the analytical solution 


pase (1 E e Et) (6.15) 


L/R is the circuit's time constant T. As t — oo, we get the equation for a resistor V = IR. 
You will notice a lot of R/L’s in Equation 6.7. 
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Figure 6.4: Series resistor and inductor circuit, an RL circuit. 


6.2.2 Sensors 


We are going to assume that we can measure the vertical position and the two currents directly. 
This is not entirely the case in a real machine. The vertical position is measured indirectly in a 
real machine. We also assume we have available control voltage. 


6.2.3 Disturbances 


The disturbances are due to Edge Localized Modes (ELM). An Edge Localized Mode is a 
disruptive magnetohydrodynamic instability that occurs along the edges of a Tokamak plasma 
due to steep plasma pressure gradients [18]. The strong pressure gradient is called the edge 
pedestal. The edge pedestal improves plasma confinement time by a factor of 2 over the low- 
confinement mode. This is now the preferred mode of operation for Tokamaks. A simple model 
for an ELM is 


d=k (0% 07%) (6.16) 


dis the output of the ELM. It can be scaled by k based on the usage. For example, in Figure 6.5 
it is scaled to show the output of a sensor. In our simulation, it is scaled to produce a driving 
force on the plasma. 

Tı > T with the ELMs appearing randomly. The function ELM produces one ELM. The 
simulation must call it with a new sequence of times to get a new ELM. Figure 6.5 shows 
the results of the built-in demo in ELM. The function also computes the derivative since the 
derivative of the disturbance is also an input. 


ELM.m 


function eLM = ELM( taul, tau2, k, t ) 
% Constants from the reference 
if( nargin < 3 ) 
taul = 6.0e-4; 
tau2 = 1.7e-4; 
k X 
end 


oI O t^ A WN - 
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10 $ Reproduce the reference results 
11 if( nargin « 4 ) 


12 t = linspace(0,12e-3); 

13 end 

14 

15 d = k«[ exp( -t/taul ) - exp( -t/tau2 );... 

16 exp( -t/tau2 )/tau2 - exp( -t/taul )/taul ]; 


18 if( nargout == ) 


19 PlotSet( t«1000, d, ‘x label’, 'Time (ms)', 'y label’, ['d' ‘dd/dt’}, 
"teu esto p AIMA 

20 else 

21 eLM - d; 

2 end 


Figure 6.5: Edge Localized Mode. 
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6.2.4 Controller 


We will use a controller to control the vertical position of the plasma, which otherwise is un- 
stable as shown earlier. The controller will be a state space system using full state feedback. 
The states are the two currents. Position, z, is controlled indirectly. We will use a quadratic 
regulator. We will use a continuous version. This just means that we need to sample much 
faster than the range of frequencies for the control. 


QCR.m 


V OI DW BON 


if( nargin < 1 ) 
Demo 
return 
end 
bor = b/r; 
[santo = c catus lay oil ae 
if( rr == ) 
disp('Repeated roots. Adjust q or r'); 


end 


k = rN(b'«sinf); 


If you get repeated roots, you must manually adjust q or r, the state and control weights, 


respectively. The matrix Riccati equation is solved in the subfunction Riccati. Notice the 


use of unique to find repeated roots. 


QCR.m 
1 function [sinf, rr] = Riccati( g ) 
2 
3 [w, el = eig(g); 
4 
llas =e EEE) 
6 
7 es = sort (diag(e)); 
8 
9 % Look for repeated roots 
o if ( length(unique(es)) < length(es) ) 
1 sero em dba 
2 else 
3 laa — E 
4 end 
5 
6 % Sort the columns of w 
7 ws = w(:, real (diag(e)) < 0); 
8 
9 ginf = real (ws(rg/2+1:rg,:)/ws(1:r9/2,:)); 
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The demo is for a double integrator. This is just 
Z¿=u (6.17) 


In state space form, this becomes 


dea epp (6.18) 


The states are z, position, and v, velocity. u is the input. The following is the built-in demo 
code. The demo shows that the function really creates a controller. 


ex 10; aber IIR 
Opty 
eye (2); 


Hav 


ERROR Et 197. tots de. Jg 


V 0 NY AN R Ot - 


e = eig(a-b«k); 


11 fprint£(’\nGain = [$5.2f %5.2f]\n\n’,k); 
12 disp('Eigenvalues'); 
13 disp (e) 


We chose the cost on the states and the control to all be 1. 
>> OCR 


(ois Y OC a). Ws] 


Eigenvalues 
-0.8660 + 0.5000i 
-0.8660 - 0.50001 


We compute the eigenvalues to show that the result is well behaved. The result is critically 
damped, that is, damping ratio of 0.7071 in the second-order damped oscillator equation. 


a? + 2(wx + ie (6.19) 
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6.3 Dynamical Model 
6.3.1 Problem 


Create a dynamical model. 


6.3.2 Solution 


Implement the plasma dynamics model in a MATLAB function. 


6.3.3 How It Works 


We first code the right-hand side function. We create the four matrices in the 
DefaultDataStructure function. This makes the right-hand side really simple. 


RHSTokamak.m 
function [xDot,z] = RHSTokamak( x, ^, d ) 


1 
D 
3 if( nargin < 3) 

4 if( nargin -- ) 

5 xDot - UpdateDataStructure(x); 
6 else 

7 xDot = DefaultDataStructure; 

8 end 

9 


10 return; 
11 end 


135 ú Mid dream 

14 vDot = (x(3) - d.vC) /d.tauT; 

15 xDot = [d.aS«x(1:2) + d.bSx*u;vDot]; 
d.cS*x(1:2) + d.dSx*u; 


a 
N 
Ul 


18 function d = DefaultDataStructure 


Ei dol ee EDU EIA AA DD SOS A AOS SS UN OO SS ose 

21 'rAA', 35.0e-3, 'rVV',2.56e-3,'1AP',115.2e-6,'1VP',3.2e-6,... 

22 'apP',0.449e-6,'tauT',310e-6,'iP',1.5e6,'aS', [];'bS', [],*cS", 
Rds leo: 

23 UME O Use? Op 

24 

25 d = UpdateDataStructure( d ); 


27 function d - UpdateDataStructure( d ) 


29 kAV 
30 OMKAV 


d.1AV^2/ (d.lAA«d.lVV); 
1 - kAV; 
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31 kA = 1/(d.lAA«oMKAV); 
32 mVP = d.aPP«d.1lVV/d.1VP^2; 
33 OMMVP = 1 - mVP; 


35 if( mVP >= 1 ) 

36 fprintf('mVP = $f should be less than 1 for an elongated plasma in a 
resistive vacuum vessel. aPP is probably too large\n’,mVP) ; 

3 end 


a9 LET EAV >= 4 

40 fprintf('kAV = $f should be less than 1 for an elongated plasma in a 
resistive vacuum vesselWn',kAV); 

end 


d.aS = (1/oMKAV)+*[ -d.rAA/d.lAA d.rVV«kAV/d.lAV;... 
d.rAA«kAV/d.lAV -(d.rVV/d.1VV) *(KAV - mVP)/oMMVP]; 
[kA 0 0;kAV/(d.lAV«(1-kAV)) 1/(d.lVP«oMMVP) 0]; 


46 d.cS = -[d.1AP d.1VP]/d.aPP/d.iP; 
41 d.dS = [0 O 1]/d.aPP/d.iP; 
4 eAS - eig(d.aS); 


SETE ETE 
Q 
(3 
n 
Il 


tA 
o 


disp('Eigenvalues') 
fprintf(’\n Mode 1 $12.2fMn Mode 2 %12.2f\n’,eAS); 


wn 


If we type RHSTokamak at the command line, we get the default data structure. 


>> RHSTokamak 
ans = 


struct with fields: 


1AA: 0.0425 
lAV: 4.3200e-04 
1VV: 1.2000e-05 
rAA: 0.0350 
FA eC 0025 
lAP: 1.1520e-04 
1VP: 3.2000e-06 
aPP: 4.0000e-07 


tauT: 3.1000e-04 
aS: [2x2 double] 
bS: [2x3 double] 
cS: [-288 -8] 
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dS: [O 2500000 0] 
eLM: 0 
pU) 


This is the four matrices plus all the constants. The two inputs, control voltage d . vC, and ELM 
disturbances, d . eLM, are zero. If you have your own values of 1AA, and so on, you can do 


d = RHSTokamak; 
d.lAA = 0.046; 
d.tauT 30.00035* 
d = RHSTokamak (d) 


and it will create the matrices. There are two warnings to prevent you from entering invalid 
parameters. 
To see that the system really is unstable type 


»» RHSTokamak 


Eigenvalues 

Mode 1 SA G7 
Mode 2 So 
Delay 2320/5 9 


These agree with the reference for JET. The third is the first-order lag. Note that 


>> d.aPP 
ans = 
4.4900e-07 


This value was chosen so that the roots match the JET numbers. 


6.4 Simulate the Plasma 
6.4.1 Problem 


We want to simulate the vertical position dynamics of the plasma with ELM disturbances. 


6.4.2 Solution 


Write a simulation script called DisruptionSim. 
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6.4.3 How It Works 


The simulation script is an open-loop simulation of the plasma. 


DisruptionSim.m 


$$ Constants 


D ll = RHSTokamak; 

3 taulELM = 6.0e-4; $ ELM time constant 1 

4 tau2ELM = 1.7e-4; $ ELM time constant 2 

5 KELM = ab. $ ELM gain matches Figure 2.9 in Reference 2 
6 tRepELM = 48e-3; $ ELM repetition time (s) 

3 

8 $$ The control sampling period and the simulation integration time step 
9 dT = le-4; 

10 

1 $$ Number of sim steps 

12 nSim =S ALO 


9.9 


14 2$ Plotting array 
15 xPlot - zeros(7,nSim); 


5; $$ Initial conditions 


18 X = [0;0;0]; $ State is zero 

19 E = 09 3 TIME 

20 tRep = 0.001; $ Time for the 1st ELM 

240 CELM - inf; $ Prevents an ELM at the start 

DEZO = 0; % For the first difference rate equation 


24 %% Run the simulation 
2 for k = 1:nSim 


26 d.v e 0% 

27 d.eLM = ELM( taulELM, tau2ELM, kELM, tELM ); 
28 tELM = tELM + dT; 

29 

30 % Trigger another ELM 

31 if( t > tRep + rand«tRepELM ) 

32 tELM 0% 

33 tRep = itp 

34 end 

35 

36 x = RK4\( @RHSTokamak, x; dT; t, d); 
37 marz = RHSTokamak( x, t, d ); 

38 E S ae gre 

39 zDot = (z - zOld)/dT; 

40 ADIOS EZ Doe a e EMIF 

41 end 
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23% Plot the results 
tPlot = dTx*(0:nSim-1)x*1000; 


yL = (IA! “IVY 'v! 'z (m)' 'zDot (m/s)' "ELM' ^ELMDot']; 

k = [ab 2:4 Sis 

PlotSect(mePlot e xblotue e) sc label’ times MS ay) 
figure title’, ‘Disruption Simulation’ ); 

k = (i; Bl; 

Alora mlog, xPlot Ue a) Yes dial, bas (me), Y cia Ao), 
figure title’, 'ZDot and ELM’ ); 


It prints out the eigenvalues for reference to make sure the dynamics work correctly. 


>> DisruptionSim 


Eigenvalues 

Mode 1 S25 G7 
Mode 2 SANS 
Delay -3225.81 


The value for the magnitude of the ELMs was found by running the simulation and looking 


at the magnitude of 2 and matching them to the results in the reference. 


1 


1 


tRepELM = 48e-3; % ELM repetition time (s) 


The value of the time derivative of the plasma vertical position z is just the first difference. 


zDot = (z - zOld)/dT; 


The ELMs are triggered randomly inside of the simulation loop. tRep is the time of the 


last ELM. It adds a random amount of time to this number. 


Rh uN» 


if( t > tRep + rand«tRepELM ) 
tELM = 0; 


end 


The results are shown in Figure 6.6. The currents grow with time due to the positive eigen- 


value. The only disturbance is the ELMs, but they are enough to cause the vertical position to 
grow. 


6.5 Control the Plasma 
6.5.1 Problem 


We want to control the plasma vertical position. 
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Figure 6.6: Plasma simulation. The position is unstable. 
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6.5.2 Solution 


Write a simulation script called Cont rolSimto demonstrate closed loop control of the vertical 
position of the plasma. 


6.5.3 How It Works 


ControlSim is a closed loop simulation of the plasma. We added the controller with the 
gains computed by QCR. 


ControlSim.m 


17 $$ Constants 


i8 d - RHSTokamak; 

19 taulELM = 6.0e-4; $ ELM time constant 1 

20 tau2ELM = es $ ELM time constant 2 

21 KELM = Mi bes6; $ ELM gain matches Figure 2.9 in Reference 2 
22 tRepELM - 48e-3; $ ELM repetition time (s) 

23 controlOn MEUS; 

24 vCMax = 3e-4; 


20 $$ The control sampling period and the simulation integration time step 
2 SU = le-5; 


29 %% Number of sim steps 
30 nSim = 20000; 


32 $% Plotting array 
33 xPlot = zeros(8,nSim); 


35 $$ Initial conditions 


36 Xx = [0;0;0]; 

Qu Ac = 10% 

38 tRep = 0.001; % Time for the 1st ELM 

39 CELM = algne sg $ This value will be change after the first ELM 
40 zOld zy (08 $ For the rate equation 

4 zZ - 0; 

42 

4 $$ Design the controller 

44 kControl =] OER cl AS, Classis, ib), Coy), E. 
45 

46 $$ Run the simulation 

41 for k - 1:nSim 

48 if( controlon } 

49 d.vC = -kControlsx(1:2); 

50 if( abs(d.vC) » vCMax ) 

51 d.vC = sign(d.vC) *vCMax; 

52 end 
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53 else 

54 d.vC = 0; *#ok<UNRCH> 

55 end 

56 

57 d.eLM = ELM( taulELM, tau2ELM, kELM, tELM ); 

58 tELM = tELM + dT; 

59 

60 % Trigger another ELM 

61 if( t > tRep + rand«tRepELM ) 

62 tELM = 0 

63 tRep EX TEE 

64 end 

65 

66 x - RKA( GRHSTokamak, x, dT, t, d ); 

67 [za] = RHSTokamak( x, t, d ); % Get the position 
68 E = E + dT; 

69 zDot = (z - zOld)/dT; % The rate of the vertical position 
70 XPS C EES a DoE; are EM a A S] 5 

7 end 


The controller is implemented in the loop. It applies the limiter. 


48 if( abs(d.vC) > vCMax ) 

49 d.vC = sign(d.vC) *vCMax; 

50 end 

51 else 

52 ol NIC: = 0; *#ok<UNRCH> 
53 end 

54 


55 d.eLM = ELM( taulELM, tau2ELM, kELM, tELM ); 


Results are shown in Figure 6.7. 


6.6 Training and Testing 
6.6.1 Problem 


We want to detect measurements leading up to disruptions. 


6.6.2 Solution 


We use a BiLSTM (bidirectional long short-term memory) layer to detect disruptions by clas- 
sifying a time sequence as leading up to a disruption or not. LSTMs are designed to avoid the 
dependency on old information. A standard RNN has a repeating structure. An LSTM also has 
a repeating structure, but each element has four layers. The LSTM layers decide what old infor- 
mation to pass on to the next layer. It may be all, or it may be none. There are many variants 
on LSTM, but they all include the fundamental ability to forget things. BiLSTM is generally 
better than an LSTM when we have the full time sequence. 
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Figure 6.7: Plasma simulation. The position is now bounded. Compare this to Figure 6.6. 
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6.6.3 How It Works 


The following script, TokamakNeuralNet .m, generates the test and training data, trains the 
neural net, and tests it. The constants are initialized first. 


TokamakNeuralNet.m 
6 3% Constants 
Y Gl = RHSTokamak; 
8 taulELM = 6.0e-4; % ELM time constant 1 
9 tau2ELM Si yes % ELM time constant 2 
10 KELM ES Sa) $ ELM gain matches Figure 2.9 in Reference 2 
11 tRepELM = 48e-3; $ ELM repetition time (s) 
12 controlOn - true; $ Turns on the controller 
13 disThresh = 1.6e-6; $ This is the threshold for a disruption 
14 
15 $ The control sampling period and the simulation integration time step 
16 dT = le-5; 
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$ Number of sim steps 
nSim = 2000; 


% Number of tests 
n Ex 390) 
SigmalELM = 2e-6xabs (rand(1,n)); 


PlotSet(1:n,sigmalELM,'x label','Test Case','y label','1 \sigma ELM Value 
"yg 


2 


zData = zeros(1,nSim); % Storage for vertical position 


We design the controller as we did in Contro1Sim. The script runs 100 simulations. The 


linear quadratic controller demonstrated in Cont rolSim controls the position. 


28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 


3% Initial conditions 


x = [0;0;0]; % The state of the plasma 
tRep = 0.001; % Time for the 1st ELM 


3% Design the controller 
kControl OCR das PASEA e Y SE 


s = cell(n,1); 


$$ Run n simulation 
for j - 1:n 
$ Run the simulation 


E =O 

tELM = inf; % Prevents an ELM at the start 
kELM = sigmalELM(j); 

tRep = 0.001; % Time for the 1st ELM 


for k = 1:nSim 
1£ 1 controlon o 


APNEA IS OR to (ro 
else 
Clevo = 0; $Zok«UNRCH» 
end 
d.eLM = ELM( taulELM, tau2ELM, kELM, tELM ); 
tELM = CELM + dT: 


o 


$ Trigger another ELM 
if( t > tRep + rand«tRepELM ) 


tELM 20; 
tRep = t; 
end 


RK4( GRHSTokamak, x, dT, t, d ); 
RHSTokamak( x, t, d ); 


= a 
i 
N 
a 
mon 
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63 t = e ds clips 
64 zData(1,k) =: Up 

65 end 

66 s{j} = zData; 

67 end 


6 clear c 


A disruption is any time response with z peaks over the threshold. Figure 6.8 shows the 
responses and the distribution of standard deviations. The blue line is a simulation that failed 
to keep vertical displacement of the plasma field under the prescribed threshold, and the red 
line is a simulation that succeeded in keeping the displacements below that threshold. The 
classification criteria are set in the following code. 


72 $$ Classify the results 


7 J - find(sigmalELM » disThresh); 
74 JN - find(sigmalELM « disThresh); 
w jp) = als 
de AGN) Sp 


77 

73 [t,tL] = TimeLabel ((0:nSim-1) *dT) ; 

719 PlotSet(t, [s{j(1)};s{jN(1)}],’x label',tL,'y label','z (m)','Plot Set’ 
,{1:2},’legend’,{{’disruption’,’stable’}}); 


The training is done next. 


% Divide into training and testing data 


oo 
oe 


g nTrain = floor(0.8«n); % Train on 80% of the cases 
3 xTrain = s(1:nTrain); 

84 yTrain = categorical(c(1:nTrain)); 

85 xTest = s(nTrain+1:n) ; 

86 yTest = Categorical(c(nTrain+1:n)); 


8 $$ Train the neural net 
s numFeatures = 1; % Just the plasma position 
5 


90 numClasses = 2p Disruption or non disruption 
91 numHiddenUnits = 200; 

92 

93 layers = [ 

94 sequenceInputLayer (numFeatures) 

95 bilstmLayer (numHiddenUnits, 'OutputMode','last') 
96 fullyConnectedLayer (numClasses) 

97 softmaxLayer 

98 classificationLayer] ; 

99 disp(layers) 

100 

100 options - trainingOptions('adam', 

102 'MaxEpochs',60, 

103 'GradientThreshold',2, 


104 'Verbose',0, 
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Figure 6.8: Time responses and distribution of 1-sigma values. 
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Figure 6.9: Training. 
e... Training Progress (29-Sep-2019 10:15:49) 
Training Progress (29-Sep-2019 10:15:49) 
oan 
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1 Mein 
L um 
; 
105 'Plots','training-progress'); 
106 
107 net - trainNetwork(xTrain,yTrain,layers,options); 


The training is shown in Figure 6.9. 
The testing is done next. 


108 $$ Demonstrate the neural net 

109 

10 $$$ Test the network 

11 yPred - classify(net,xTest); 

112 

13 $ Calculate the classification accuracy of the predictions. 
114 acc - gum(yPred -- yTest)./numel(yTest); 

15 disp('Accuracy') 

16 disp(acc); 


The results are encouraging. ITER will require 95% of disruption predictions to be correct 
and to present an alarm 30 ms before a disruption [25]. Good results have been obtained using 
data from DIII-D [17]. 
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»» TokamakNeuralNet 


Eigenvalues 
Mode 1 Seon, 
Mode 2 ALAS dis 


5x1 Layer array with layers: 


dl ws Sequence Input Sequence input with 1 dimensions 
2 lot BiLSTM BiLSTM with 200 hidden units 
3 D Fully Connected 2 fully connected layer 
4 Px, Softmax softmax 
5 EY Classification Output crossentropyex 
Accuracy 
0.7500 


This chapter did not deal with recursive or online training. A disruption prediction would need 
to constantly incorporate new data into its neural network. In addition, the other criteria for 


disruption detection would also need to be incorporated. 
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Classifying a Pirouette 


7.1 Introduction 


A pirouette is a familiar step in ballet. There are many types of pirouettes. We will focus on 
an en dehors (outside) pirouette from fourth position. The dancer pliés (does a deep knee bend) 
then straightens her legs producing both an upward force to get on the tip of her pointe shoe 
and a torque to turn about her axis of revolution. 

In this chapter, we will classify pirouettes. Four dancers will each do ten double pirouettes, 
and we will use them to train the deep learning network. The network can then be used to 
classify pirouettes. 

This chapter will involve real-time data acquisition and deep learning. We will spend a 
considerable amount of time in this chapter creating software to interface with the hardware. 
While it is not deep learning, it is important to know how to get data from sensors for use in 
deep learning work. We give code snippets in this chapter. Only a few can be cut and pasted 
into the MATLAB command window. You'll need to run the software in the downloadable 
library. Also remember, you will need the Instrument Control Toolbox for this project. 

Our subject dancers showing a pirouette are shown in Figure 7.1. We have three female 
dancers and one male dancer. Two of the women are wearing pointe shoes. The measurements 
will be accelerations, angular rates, and orientation. There really isn’t any limit to the move- 
ments the dancers could do. We asked them all to do double pirouettes starting from fourth 
position and returning to fourth position. Fourth position is with one foot behind the other and 
separated by a quarter meter or so. This is in contrast to fifth position where the feet are right 
against each other. Each is shown at the beginning, middle, and end of the turn. All have 
slightly different positions, though all are doing very good pirouettes. There is no one “right” 
pirouette. If you were to watch the turns, you would not be able to see that they are that different. 
The goal is to develop a neural network that can classify their pirouettes. 

This kind of tool would be useful in any physical activity. An athlete could train a neural 
network to learn any important movement. For example, a baseball pitcher’s pitch could be 
learned. The trained network could be used to compare the same movement at any other time 
to see if it has changed. A more sophisticated version, possibly including vision, might suggest 
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Figure 7.1: Dancers doing pirouettes. The stages are from left to right. 
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Table 7.1: LPMS-B2: 9-Axis Inertial Measurement Unit (IMU). 


2.1 + EDR / Low Energy (LE) 4.1 


< 05 ratio), < 2° RMS (dynamic) 


EIU": Pitch: 


125/ x 245/ + 500/ + 1000/ + 200075, 16 bits 


Figure 7.2: LPMS-B2: 9-Axis Inertial Measurement Unit (IMU). The on/off button is high- 
lighted on the right. 


On/Off Button 


how to fix problems or identify what has changed. This would be particularly valuable for 
rehabilitation. 


7.1.1 Inertial Measurement Unit 


Our sensing means will be the LPMS-B2 IMU with its parameters shown in Table 7.1 that has 
Bluetooth. The range is sufficient to work in a ballet studio. 

The IMU has many other outputs that we will not use. A close up of the IMU is shown in 
Figure 7.2. 

We will first work out the details of the data acquisition. We will then build a deep learning 
algorithm to train the system and later to take data and classify the pirouette as being a pirouette 
done by a particular dancer. We'll build up the data acquisition by first writing the MATLAB 
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code to acquire the data. We will then create functions to display the data. We will then 
integrate it all into a GUI. Finally, we will create the deep learning classification system. 


7.1.2 Physics 


A pirouette is a complex multiflexible body problem. The pirouette is initiated by the dancer 
doing a plié and then using his or her muscles to generate a torque about the spin axis and 
forces to get onto her pointe shoe and over her center of mass. Her muscles quickly stop the 
translational motion so that she can focus on balancing as she is turning. The equation of 
rotational motion is known as Euler’s equation and is 


T = Iù +w% Iw (7.1) 


where w is the angular rate and J is our inertia. T is our external torque. The external torque is 
due to a push off the floor and gravity. This is vector equation. The vectors are T and w. J isa 
3x3 matrix. 


Ty 

E3 Ty (7.2) 
Tz 
Wa 

Ww = Wy (7.3) 
Wy 


Each component is a value about a particular axis. For example, Ty is the torque about the 
x-axis attached to the dancer. Figure 7.3 shows the system. We will only be concerned with 
rotation assuming all translational motion is damped. If the dancer’s center of mass is not above 
the box of her pointe shoe, she will experience an overturning torque. 

The dynamical model is three first-order couple differential equations. Angular rate w is the 
state, that is, the quantity being differentiated. The equation says that the external torque (due 
to pushing off the floor or due to pointe shoe drag) is equal to the angular acceleration plus the 
Euler coupling term. This equation assumes that the body is rigid. For a dancer, it means she 
is rotating and no part is moving with respect to any other part. Now in a proper pirouette, this 
is never true if you are spotting! But let’s suppose you are one of those dancers who don’t spot. 
Let’s forget about the angular rate coupling term, which only matters if the angular rate is large. 
Let’s just look at the first two terms which are T = Iw. Expanded 


Ty Tas Ly Ij Wy 
Ty |=| Isy Iy Iy || ày (7.4) 
T; Toz I I; Wz 


Let's look at the equation for T,. We just multiply the first row of the inertia matrix times 
the angular rate vector. 
T; = Iggy + LyzWy + Ls (7.5) 
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Figure 7.3: The center of mass of the dancer. External forces act at the center of mass. Rotations 
are about the center of mass. 


This means a torque around the z-axis influences the angular rates about all 3 axes. 
We can write out the required torque. 


T; Ij 
Dy. || fes. | iy (7.6) 
T; I; 


This is the perfect pirouette push off because it only creates rotation about the vertical axis, 
which is what we want in a pirouette. To this you need to add the forces needed to get on pointe 
with your center of mass over your pointe shoe tip. 

While turning, the only significant external torque is due to friction between the pointe shoe 
tip and the floor. Friction resists both turning motion and translational motion. You don't want 
a slight side force, perhaps due to a less than great partner, to cause you to slide. 

Our IMU measures angular rates and linear accelerations. Angular rates are the quantities 
in Euler equations. However, since the IMU is not at the dancer's center of mass, it will also 
measure angular accelerations along with the acceleration of the center of mass. We locate it at 
the dancer's waist so it is not too far from the spin axis but it still sees a component. 


d, Z TMU (7.7) 
where rymu is the vector from the dancer’s center of mass to the IMU. 


119 


CHAPTER 7 Œ CLASSIFYING A PIROUETTE 


For a dancer doing a pirouette, Euler’s equation is not sufficient. A dancer can transfer 
momentum internally to stop a pirouette and needs a little jump to get on demi-pointe or pointe. 
To model this, we add additional terms. 


T = Iw+w” [Iv + ul;(Q; + wz) + u(n + wz)] + u(T; + Th) (7.8) 
T = L(0+0,) (7.9) 
T, = L (O, + we) (7.10) 
| esas (7.11) 


where m is the mass, F’ is the vertical force, and z is the vertical direction. J; is the internal 
inertia for control, and J;, is the head inertia. 7 includes both of these already. That is, J is the 
total body inertia that includes the internal **wheel," body, and head. T; is the internal torque. 
Th is the head torque (for spotting). The internal torques, T;, and Ty, are between the body and 
the internal *^wheel" or head. For example, T, causes the head to move one way and the body 
the other. If you are standing, the torque you produce from your feet against the floor prevents 
your body from rotating. T is the external torque due to friction and the initial push off by the 
feet. The unit vector is 


0 
u=|0 (7.12) 
1 


There are six equations in total. The first is a vector equation with three components, the second 
two are scalar equations. The vector equation is three equations, and each scalar equation is 
just one equation. We can use these to create a simulation of a dancer. The second component 
models all z-axis internal rotation, including spotting. 


7.2 Data Acquisition 
7.2.1 Problem 
We want to get data from the Bluetooth IMU. 


7.2.2 Solution 


We will use the MATLAB bluetooth function. We'll create a function to read data from the 
IMU. 
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Figure 7.4: The IMU is connected to the bluetooth device from a MacBook Pro via USB C and 
a ubiquitous Mac dongle. This is only for charging purposes. Once it is charged, you can use 
without the dongle. 


7.2.3 How It Works 


We will write an interface to the bluetooth device. First make sure the IMU is charged. Connect 
it to your computer as shown in Figure 7.4. Push the button on the back. This turns it on and 
off. The status is indicated by the LED. The IMU comes with support software from the vendor, 
but you will not need any of their software as MATLAB does all the hard work for you. 

Let's try commanding the IMU. Type bt Info = instrhwinfo(’Bluetooth’ ) and 
you should get the following: 


>> btInfo = instrhwinfo(’Bluetooth’ ) 
brine 
HardwareInfo with properties: 
RemoteNames: ('LPMSB2-4B31D6') 
RemoteIDs: ('btspp://00043E4B31D6') 
BluecoveVersion: 'BlueCove-2.1.1-SNAPSHOT' 


JarFileVersion: ‘Version 4.0’ 


Access to your hardware may be provided by a support package. Go to the 
Support Package Installer to learn more. 


This shows that your IMU is discoverable. There is no support package available from the 
MathWorks. Now type b = Bluetooth (btInfo.RemotelDs1,1) (this can be slow). 
The number is the channel. The Bluetooth function requires the Instrument Control Toolbox 
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for MATLAB. 


>> b = Bluetooth (btInfo.RemoteIDs{1},1) 


Bluetooth Object : Bluetooth-btspp://00043E4B31D6:1 


Communication Settings 


RemoteName: LPMSB2 -4B31D6 
RemoteID: btspp://00043e4b31d6 
Channel: T 

Terminator: LET Ru 


Communication State 
Status: closed 
RecordStatus: off 


Read/Write State 


TransferStatus: idle 
BytesAvailable: 0 
ValuesReceived: 0 
ValuesSent: 0 


Note that the Communication State Status shows closed. We need to open the device by 
typing £open (b). If you don't have this device, just type 


»» btInfo - instrhwinfo('Bluetooth') 
¡ibas = 
HardwareInfo with properties: 


RemoteNames: [] 
RemoteIDs: [] 
BluecoveVersion: 'BlueCove-2.1.1-SNAPSHOT' 
JarFileVersion: ‘Version 4.0' 
Access to your hardware may be provided by a support package. Go to the 
Support Package Installer to learn more. 


This says it cannot recognize remote names or ids. You may need a support package for your 
device in this case. 

Click connect and the device will open. Now type a= fscanf (b) and you will get a 
bunch of unprintable characters. We now have to write code to command the device. We will 
leave the device in streaming mode. The data unit format is shown in Table 7.2. Each packet is 
really 91 bytes long even though the table only shows 67 bytes. The 67 bytes are all the useful 
data. 

We read the binary and put it into a data structure using DataFromIMU. typecast 
converts from bytes to float. 
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Table 7.2: Reply data. 


o pa CET 
o fo — — —[OPmMATIDISBüD- — o | 
[1 [o OpeÑMATMSB —— — — | 
[3 [09 — — Command No. LSB (9d - GETSENSORDATA) | 
[1 [0 | Command No MSB — | 
[5 [o Pata engin SB — — — — |] 
[6 [0 |DeaLehMSB —  — | 
5-18 
23-26 
3134 
RED 
47-50 
55-58 
[65 [o —  . |Mesagendby] OO o | 
166 — [0A — — —Mesuseendbye2 — — — — — — ] 


DataFromIMU.m 


25 function d = DataFromIMU( a ) 
26 


27 d.packetStart - dec2hex(a(1)); 

28 d.openMATIDLSB = dec2hex(a(2)); 

29 d.openMATIDMSB = dec2hex(a(3)); 

30 d.cmdNoLSB - dec2hex(a(4)); 

31 d.cmdNoMSB - dec2hex(a(5)); 

32 d.dataLenLSB - dec2hex(a(6)); 

33 d.dataLenMSB - dec2hex(a(7)); 

34 d.timeStamp - BytesToFloat( a(8:11) ); 

35 d.gyro = [ BytesToFloat( a(12:15) );. 
36 BytesToFloat( a(16:19) );.. 
37 BytesToFloat( a(20:23) )1; 
38 d.accel =M(MBy SS TOR VO AER DAI) Pe 
39 BytesToFloat( a(28:31) );.. 
40 BytesToFloat( a(32:35) )]; 
41 d.quat - [ BytesToFloat( a(48:51) ); 
42 BytesToFloat( a(52:55) );. 
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43 BytesToFloat( a(56:59) ); 
44 BytesToFloat( a(60:63) )]; 
45 d.msgEndl dec2hex(a(66)); 
46 d.msgEnd2 dec2hex(a(67)); 


4 $$ DataFromIMU>BytesToFloat 
50 function r = BytesToFloat( x ) 


5 = typecast (Uintes (X) single) 


We've wrapped all of this into the script BluetoothTest .m. We print out a few samples 
of the data to make sure our bytes are aligned correctly. 


BluetoothTest.m 


3% Script to read binary from the IMU 


1 

D. 

3 $ Find available Bluetooth devices 

4 btInfo - instrhwinfo('Bluetooth') 

5 

6 $ Display the information about the first device discovered 

7 btInfo.RemoteNames(1) 

8 btInfo.RemoteIDs(1) 

9 

0 $ Construct a Bluetooth Channel object to the first Bluetooth device 

1 b = Bluetooth(btInfo.RemoteIDs(1], 1); 

p 

3 $ Connect the Bluetooth Channel object to the specified remote device 

4 fopen(b); 

5 

6 % Get a data structure 

7 tic 

Gr. Br mu 

OE OLEO O 

20 a = fread(b,91); 

21 d = DataFromIMU( a ); 

22 fprintf('$12.2f [$8.1e $8.1e $8.1e] [$8.1e $8.1e $8.1e] [$8.1f %8.1f 
$8.1f $8.1f]WMn',t,d.gyro,d.accel,d.quat); 

23 t-t + toc; 

24 tic 

25 end 


When we run the script we get the following output. 
>> BluetoothTest 
perno- 

HardwareInfo with properties: 


RemoteNames: ('LPMSB2-4B31D6') 
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RemoteIDs: ('btspp://00043E4B31D6') 
BluecoveVersion: 'BlueCove-2.1.1-SNAPSHOT' 
JarFileVersion: 'Version 4.0' 


Access to your hardware may be provided by a support package. 


Support Package Installer to learn more. 


ans = 
1x1 cell array 
(' LPMSB2-4B31D6'] 
ans - 


1x1 cell array 


('btspp://00043E4B31D6' } 


ans - 


1x11 single row vector 


1.0000 0.0014 
0.9200 -0.0037 
ans - 
1x11 single row vector 
2.0000 -0.0008 
0.9200 -0.0037 
ans - 
1x11 single row vector 
3.0000 0.0004 
0.9200 -0.0037 


0.0023 


0.0023 


0.0023 


-0.0022 0.0019 210/0305 
0.0144 ORS Out 

-0.0016 0.0029 = Oe, Obes 
0.0144 ORS OAS 

-0.0025 0.0028 2:0::0452:5 
0.0144 013/9165 


Go to the 


-0.9896 


ES DEl 


-0.9900 


The first number in each row vector is the sample, the next three are the angular rates from 
the gyro, the next three the accelerations, and the last four the quaternion. The acceleration is 
mostly in the -z direction which means that +z is in the button direction. Bluetooth, like all 
wireless connections, can be problematic. If you get this error 


Index exceeds th 


btInfo.RemoteNames (1) 


Error in BluetoothTest (line 7) 


number of array elements (0). 
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turn the IMU on and off. You might also have to restart MATLAB at times. This is because 
RemoteNames is empty, and this test is assuming it will not be. MATLAB then gets confused. 


7.3 Orientation 
7.3.1 Problem 


We want to use quaternions to represent the orientation of our dancer in our deep learning 
system. 


7.3.2 Solution 


Implement basic quaternion operations. We need quaternion operations to process the quater- 
nions from the IMU. 


7.3.3 How It Works 


Quaternions are the preferred mathematical representation of orientation. Propagating a quater- 
nion requires fewer operations than propagating a transformation matrix and avoids singulari- 
ties that occur with Euler angles. A quaternion has four elements, which corresponds to a unit 
vector a and angle of rotation ¢ about that vector. The first element is termed the ‘‘scalar com- 
ponent”” s, and the next three elements are the ‘‘vector’’ components v. This notation is shown 
as follows [21]: 


q = qi = vı = m sin ; (7.13) 
q2 U2 ag sin g 
43 U3 a3 sin g 


The “unit”” quaternion which represents zero rotation from the initial coordinate frame has 
a unit scalar component and zero vector components. This is the same convention used on the 
Space Shuttle, although other conventions are possible. 


qo = (7.14) 


CO V EX 


In order to transform a vector from one coordinate frame a to another b using a quaternion qab, 
the operation is 


Up = qIusdab (7.15) 
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using quaternion multiplication with the vectors defined as quaternions with a scalar part equal 
to zero, or 


0 
_ | all) 
La = 2a(2) (7.16) 
212) 
For example, the quaternion 
| 0.7071 | 
0.7071 
0.0 (7.17) 
0.0 


represents a pure rotation about the x-axis. The first element is 0.7071 and equals the cos(90°/2). 
We cannot tell the direction of rotation from the first element. The second element is the 1 com- 
ponent of the unit vector, which in this case is 


1.0 
0.0 (7.18) 
0.0 


times the argument sin(90°/2). Since the sign is positive, the rotation must be a positive 90° 
rotation. 

We only need one routine that converts the quaternion, which comes from the IMU, into a 
transformation matrix for visualization. We do this because multiplying a 3x n array of vectors 
for the vertices of our 3D model by a matrix is much faster than transforming each vector with 
a quaternion. 


QuaternionToMatrix.m 


1 2x (q(2)q(4) *q(1) xq(3)) ;.. 

2 2x (q(2) «q(3) *q(1) xq(4)) , .. 

3 eb) AeA) axons) oer) ay 
4 2% (q(3)*q(4) -q(1) «q(2)) i.. 

5 2* (q(2) «a(4) -q(1) «q(3)),.. 

6 2* (q(3) xq(4) «q(1) «q(2)),.. 

7 eub esito Serene vetat alls 


Note that the diagonal terms have the same form. The off-diagonal terms also all have the 
same form. 
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7.4 Dancer Simulation 
7.4.1 Problem 


We want to simulate a dancer for readers who don’t have access to the hardware. 


7.4.2 Solution 


We will write a right-hand side for the dancer based on the preceding equations and write a 
simulation with a control system. 


7.4.3 How It Works 


The right-hand side implements the dancer model. It includes an internal control **wheel" and 
a degree of freedom for the head movement. The default data structure is returned if you call it 
without arguments. 


RHSDancer.m 


33 
34 
35 


$ RHSDANCER Implements dancer dynamics 
This is a model of dancer with one degree of translational freedom 
and 5 degrees of rotational freedom including the head and an 


oe oo op 


internal 
36 $ rotational degree of freedom. 
3 $$ Form: 
ae $ DOLUS -—SRHSDancenm c PE d» 
39 $$ Inputs 
40 $ x (ELSE, al) State vector [r;v;q;w;wHDot;wIDot] 
4 % t (ai 3L) Time (unused) (s) 
4 % d (ab) Data structure for the simulation 
43 % . torque (3,1) External torque (Nm) 
44 % . force (1,1) External force (N) 
45 & .inertia (3,3) Body inertia (kg-m^2) 
46 & .inertiaH (1,1) Head inertia (kg-m^2) 
ad E .inertial (1,1) Inner inertia (kg-m^2) 
48. E .mass (1,1) Dancer mass (kg) 
49 & 
so $$ Outputs 
51 $ xDot (ILI a) d[r;v;q;w;wHDot;wIDot]/dt 
52 
53 function xDot = RHSDancer( ^, x, d ) 


o 


5 % Default data structure 
56 if( nargin < 1 ) 


57 % Based on a 0.15 m radius, 1.4 m long cylinders 

58 inertia = diag([8.4479 8.4479 0.5625]); 

59 xDot S tee VEEE ORO AOI, UE OnrcouA OP end ad Ee IE dee 

60 Amasse SO It] est ase 010:3:3 in exte da ESO 0292 toT cre EUM OPI t: CTS TUE lí 
2101) 87 

61 return 

oe end 
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The remainder mechanizes the equations given earlier. We add an additional equation for 


the integral of the z-axis rate. This makes the control system easier to write. We also include 
the gravitational acceleration in the force equation. 


63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 


81 
82 
83 
84 
85 
86 
87 
88 
89 


0o - O t BR 0 lt 


$ Use local variables 


v - x(2); 
q = x(3:6); 
w E TE 
wI ENCODE, 
wH S Seal) 


$ Unit vector 
u - [0;0;1]; 


$ Gravity 
g = 9.806; 
$ Attitude kinematics (not mentioned in the text) 
qDot = @lehoOBDo tr cp WEM). 
$ Rotational dynamics Equation 7.6 
wDot = d.inertia\(d.torque - Skew(w)*(d.inertiaxw + d.inertiaI«(wI + 
w(3)). 
+ d.inertiaHx (wH + w(3))) - u*x(d.torqueI + d.torqueH)); 
wHDot = d.torqueH/d.inertiaH - wDot(3); 
wIDot = d.torquel/d.inertial - wDot (3); 
$ Translational dynamics 
vDot = d.force/d.mass - g; 


% Assemble the state vector 
xDot = [v; vDot; qDot; wDot; wHDot; wIDot; w(3)]; 


The simulation setup gets default parameters from RHSDancer. 


d = RHSDancer; 

n = 800; 

dT = ls 

xP = zeros(16,n); 
x = zeros(12,1); 
at) = abe 

g 9758101677 
dancer, = “Robot”; 


It then sets up the control system. We use a proportional derivative controller for z position 


and a rate damper to stop the pirouette. The position control is done by the foot muscles. The 
rate damping is our internal damper wheel. 


13 
14 


o 


$ Control system for 2 pirouettes in 6 seconds 
tPirouette = 6; 
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15 zPointe = 6x0.0254; 

16 tPointe =O IS: 

17 KP = tPointe/dT; 

18 omega = 4xpi/tPirouette; 

19 torquePulse = d.inertia(3,3)*omega/tPointe; 
20 tFriction eode 

2 a - 2xzPointe/tPointe^2 + g; 
22 kForce 210005 

23 tau =, ONE 

24 thetaStop = 4xpi - pi/4; 

2 kTorque = s 

26 state = zeros(10,n); 


The simulation loop calls the right-hand side and the control system. We call RHSDancer.m 
to get the linear acceleration. 


1 $$ Simulate 


ye kobe |e LS 

3 d.torqueH = 0; 

4 d.torquel = 0; 

5 

6 $ Get the data for use in the neural network 

a xDot = RHSDancer(0,x,d); 

8 

9 state(:,k) = [x(7:9);0;0;xDot (2) ;x(3:6)]; 

0 

1 S CONE BOL 

2 if( k « kP ) 

3 d.force = d.massx*a; 

4 d.torque = [0;0;torquePulse] ; 

5 else 

6 d.force = kForcex (zPointe-x(1) -x(2)/tau)+ d.massx*g; 
7 d.torque = [0;0;-tFriction]; 

8 end 

9 
20 if( x(12) > thetaStop ) 
21 d.torquel = kTorquexx(9); 
22 end 
23 
24 xP(:,k) = [x;d.force;d.torque(3);d.torqueH;d.torqueI]; 
25 x = RungeKutta(@RHSDancer,0,x,dT,d) ; 
2 end 

The control system includes a torque and force pulse to get the pirouette going. 

1 $ Control 

2 if( k < kP ) 

3 d.force = d.massxa; 

4 d.torque = [0;0;torquePulse] ; 

5 else 

6 d.force = kForcex (zPointe-x(1) -x(2)/tau)+ d.massx*g; 
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zPointe = 6*0.0254; 
tPointe = (ap 
tFriction = (Oil, 
kForce Es OOO 

tau = (Qo 
kTorque = 200; 


d.torque = 


end 


[0;0;-tFriction]; 


The remainder of the script plots the results and outputs the data, which would have come 
from the IMU, into a file. 
Simulation results for a double pirouette are shown in Figure 7.5. We stop the turn at 6.5 
seconds, hence the pulse. 


You can create different dancers by varying the mass properties and the control parameters. 


We didn't implement spotting control (looking at the audience as much as possible during 
the turn). It would rotate the head so that it faces forward whenever the head was within 90 


Figure 7.5: Simulation of a double pirouette. 
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degrees or so of front. We’d need to add a head angle for that purpose to the right-hand side, 
much like we added the z-axis angle. 


7.5 Real-Time Plotting 
7.5.1 Problem 


We want to display data from the IMU in real time. This will allow us to monitor the pirouettes. 


7.5.2 Solution 


Use plot with drawnow to implement multiple figures of plots. 


7.5.3 How It Works 


The main function is a switch statement with two cases. The function also has a built-in demo. 
The first case, initialize, initializes the plot figures. It stores everything in a data structure 
that is returned on each function call. This is one way for a function to have a memory. We 
return the data structure from each subfunction. 


GUIPlots.m 

27 Switch( lower(action) ) 

28 case ‘initialize’ 

29 g = Initialize( g ); 
30 

31 case 'update' 

32 sj cares ovs ie je 
33 

34 end 


The first case, initialize,initializes the figure window. 


35 $$ GUIPlots>Initialize 
36 function g = Initialize( g ) 


38 lY = length(g.yLabel); 
39 


40 $ Create tLim if it does not exist 
“Gy TEES eredi c ela) D) 

42 gata [LOSS] IS 

43 end 

44 

45 g.tWidth = g.tLim(2) - g.tLim(1); 
46 

4 % Create yLim if it does not exist 
Zo Bust So EA ET) 3) 

49 g.yLim = [-ones(1Y,1), ones(1Y,1)]; 
50 end 


51 
52 es ¡Create the plots 
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3 1P = length(g.yLabel); 


54 y - g.pos(2); % The starting y position 
55 for k - 1:1P 
56 g.h(k) = subplot(1P,1,k); 


57 set(g.h(k),'position',[g.pos(1) y g.pos(3) g.pos(4)]); 
58 y = y - 1.4*g.pos(4) ; 

59 g.hPlot(k) = plot(0,0); 

60 g.hAxes(k) = gca; 


61 g.yWidth(k) = (g.yLim(k,2) - g.yLim(k,1))/2; 

62 set(g.hAxes(k),'nextplot','add','xlim',g.tLim); 
63 ylabel( char(g.yLabel{k}) ) 

64 grid on 

65 end 


66 xlabel( g.tLabel ); 


The second case, update, updates the data displayed in the plot. It leaves the existing figures, 


subplots, and labels in place and just updates the plots of the line segments with new data. It can 
change the size of the axes as needed. The function adds a line segment for each new data point. 
This way no storage is needed external to the plot. It reads xdata and ydata and appends the 


new data to those arrays. 


67 function g = Updatel g, y, t ) 

68 

69 See if the time limits have been exceeded 
OE TIA Gt Tao) MO) 


oe 


71 g.tLim(2) = g.tWidth + g.tLim(2); 
72 updateAxes - true; 

73 else 

74 updateAxes = false; 

75 end 


76 
7 1P = length(g.yLabel); 

18) EOL EK E= EAD 

79 subplot (g.h(k)); 

80 yD = get(g.hPlot(k),'ydata'); 


81 xD = get(g.hPlot(k),'xdata'); 

82 if( updateAxes ) 

83 set( gca, 'xLim', g.tLim ); 

84 set( g.hPlot(k), ‘xdata’, [xD t],’ydata’, [yD y(k)]); 
85 else 

86 Set ge bblotie) id ati ED y data [EAD Ss CKO 
87 end 

88 

89 end 


The built-in demo plots six numbers. It updates the axes in time once. It sets up a figure 
window with six plots. You need to create the figure and save the figure handle before calling 


GUIPlots. 


g.hFig = NewFig('State'); 
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The pause in the demo just slows down plotting so that you can see the updates. The height 
(the last number in g . pos) is the height of each plot. If you happen to set the locations of the 
plots out of the figure window, you will get a MATLAB error. yg. t Lim gives the initial time 
limits in second. The upper limit will expand as data is entered. 


2 function Demo 

3 

A Gota = (Usu) yy) 0590 Ose 16 Cu. 10 Pa ake hs 

5 g.tLabel = ‘Time (sec)'; 

6 g.tLim = LOM O 

7 g.pos = E). 3510/0) 0.88 0.8 omole 

a eroen = abe 

G cuadler = 19% 

0 

1 g.hFig = NewFig('State'); 

2 set(g.hFig, 'NumberTitle','off' ); 

3 

A e] -EGUHBiotsS matt SE 
5 

6 for k = 1:200 

7 y = 0.1«[cos((k/100))-0.05;sin(k/100)]; 

8 g = GUEPTOtS “Mesa. (Y Ras ds Sb ey 
9 pause(0.1) 

20 end 

21 

m ej = (eue oenen ass 1, llo e 5 


23 

24 for k = 1:200 

25 y = 0.1x[cos((k/100))-0.05;sin(k/100)]; 

26 g = GUIPlots( ‘update’, [y;y.^2;2«yl, k, g ); 
27 pause (0.1) 

28 end 


Figure 7.6 shows the real-time plots at the end of the demo. 


7.6  Quaternion Display 
7.6.1 Problem 


We want to display the dancer’s orientation in real time. 
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Figure 7.6: Real-time plots. 
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7.6.2 Solution 


Use patch to draw an OBJ model in a three-dimensional plot. The figure is easier to un- 
derstand than the four quaternion elements. Our solution can handle 3-axis rotation although 
typically we will only see single-axis rotation. 


7.6.3 How It Works 


We start with our Ballerina .obj file. It only has vertices and faces. A 3D drawing consists 
of a set of vertices. Each vertex is a point in space. The vertices are organized into faces. Each 
face is a triangle. Triangles are used for 3D drawings because they always form a plane. 3D 
processing hardware is designed to work with triangles, so this also gives the fastest results. 
The obj files for our software can only contain triangles. Each face can have only three ver- 
tices. Generally, obj files can have any size polynomials, that is, faces with more than three 
points. Most sources of obj files can provide tessellation services to convert polygons with 
more than three vertices into triangles. LoadOBJ .m will not draw models with anything other 
than triangular faces. 

The main part of the function uses a case statement to handle the three actions. The first 
action just returns the defaults, which is the name of the default obj file. The second reads in the 
file and initializes the patches. The third updates the patches. patch is the MATLAB name 
for a set of triangles. The function can be passed a figure handle. A figure handle tells it into 
which figure the 3D model should be drawn. This allows it to be used as part of a GUI, as will 
be shown in the next section. 
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QuaternionVisualization.m 


19 function m = QuaternionVisualization( action, x, f ) 
20 

20 persistent p 

22 

23 % Demo 

24 if( nargin < 1 ) 


25 Demo 
26 return 
27 end 


28 
29 Switch( lower (action) ) 


30 case ‘defaults’ 

31 m = Defaults; 

32 

33 case ‘initialize’ 

34 if( nargin < 2 ) 

35 d = Defaults; 
36 else 

37 d EE 

38 end 

39 

40 if( nargin « 3 ) 

41 oe. Lila 

42 end 

43 

44 joy Ex darte lali dol 38- JB 
45 

46 case ‘update’ 

47 if( nargout == ) 
48 IUD date p E 
49 else 

50 Update( p, x ); 
51 end 


52 end 


Initialize loads the obj file. It creates a figure and saves the object data structure. It 
sets shading to interpolated and lighting to Gouraud. Gouraud is a type of lighting model named 
after its inventor. It then creates the patches and sets up the axis system. We save handles to all 
the patches for updating later. We also place a light. 


55; function p = Initialize( file, f ) 
54 
55 if( isempty(f) ) 


56 p.fig - NewFigure( 'Quaternion' ); 
5 else 

58 iJo Eef ex 3E 

59 end 

60 

6 g - LoadOBJ( file ); 

2 p.g E 
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63 
64 
65 
66 
67 
68 
69 
70 


71 
72. 
T3 
74 
TES 
76 
T 
78 
79 
80 
81 
82 
83 


shading interp 
lighting gouraud 


CASOS n0) 2 foy n 


for k = 1:length(g.component) 
p.model (k) = patch('vertices', g.component(k).v, ‘faces’, g. 
component (k) .f, 'facecolor',c,'edgecolor',c,'ambient',1,"' 
edgealpha',0 ); 
end 


xlabel('x'); 
ylabel('y'); 
zlabel('z'); 

grid 

rotate3d on 

set (gca, 'DataAspectRatio',[1 1 1],'DataAspectRatioMode','manual') 


light ('position',10x* [1 1 1]) 


view([1 1 1]) 


In Update we convert the quaternion to a matrix, because it is faster to matrix multiply all 


the vertices with one matrix multiplication. The vertices are n by 3 so we transpose before the 
matrix multiplication. We use the patch handles to update the vertices. The two options at the 
end are to create movie frames or just update the drawing. 


84 
85 
86 
87 
88 
89 
90 
91 
92 
93 
94 
95 
96 
97 


function m = Update( p, q ) 
s = QuaternionToMatrix( q ); 
for k 1:1length(p.model) 
s*p.g.component (k).v')'; 


( 
set(p.model(k),'vertices',v); 
end 


v = 


if( nargout > 0 ) 

m = getframe; 
else 

drawnow; 
end 


This is the built-in demo. We vary the 1 and 4 elements of the quaternion to get rotation 


about the z-axis. 


109 
110 
111 


function Demo 


QuaternionVisualization( ‘initialize’, ‘Ballerina.obj’ ); 
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Figure 7.7: Dancer orientation. The obj file is by the artist loft 22 and is available from Tur- 
boSquid. 


ese Figure 4: Quaternion ese Figure 4: Quaternion 
File Edit View Insert Tools Desktop Window Help * File Edit View Insert Tools Desktop Window Help ` 
nöd BUBRE nočdè a lB AG 


Figure 7.7 shows two orientations of the dancer during the demo. The demo produces an 
animation of the dancer rotating about the z-axis. The rotation is slow because of the number of 
vertices. The figure is not articulated so the entire figure is rotated as a rigid body. MATLAB 
doesn’t make it easy to texture map so we don’t bother. In any case, the purpose of this function 
is just to show orientation so it doesn’t matter. 


7.7 Data Acquisition GUI 
7.7.1 Problem 


Build a data acquisition GUI to display the real-time data and output it into training sets. 


7.7.2 Solution 
Integrate all the preceding recipes into a GUI. 


7.7.3 How It Works 


We aren’t going to use MATLAB’s Guide to build our GUI. We will hand code it, which will 
give you a better idea of how a GUI really works. 

We will use nested functions for the GUI. The inner functions have access to all variables 
in the outer functions. This also makes using callbacks easy as shown in the following code 
snippet. 


function DancerGUI( file ) 

function DrawGUI (h) 

uicontrol( h.fig,'callback' ,@SetValue) ; 
function SetValue(hObject, ^, ^) 
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% do something 
end 


end 
end 


A callback is a function called by a uicontrol when the user interacts with the control. 
When you first open the GUI, it will look for the bluetooth device. This can take a while. 

Everything in DrawGUI has access to variables in DancerGUI. The GUI is shown in 
Figure 7.8. The 3D orientation display is in the upper left corner. Real-time plots are on the 
right. Buttons are on the lower left, and the movie window is on the right. 

The upper left picture shows the dancer’s orientation. The plots on the right show angular 
rates and accelerations from the IMU. From top to bottom of the buttons 


1. 


Turn the 3D on/off. The default model is big, so unless you add your own model with 
fewer vertices, it should be set to off. 


The text box to its right is the name of the file. The GUI will add a number to the right 
of the name for each run. 


. Save saves the current data to a file. 


Calibrate sets the default orientation and sets the gyro rates and accelerations to whatever 
it is reading when you hit the button. The dancer should be still when you hit calibrate. It 
will automatically compute the gravitational acceleration and subtract it during the test. 


. Quit closes the GUI. 
6. 
p^ 


Clear data clears out all the internal data storage. 


Start/Stop starts and stops the GUI. 


The remaining three lines display the time, the angular rate vector, and the acceleration vector 
as numbers. This is the same data that is plotted. 

The first part creates the figure and draws the GUI. It initializes all the fields for GUIPlots. 
It reads in a default picture for the movie window as a placeholder. 


DancerGUI.m 


function DancerGUI( file ) 

% Demo 

if( nargin « 1 ) 
DancerGUI (’Ballerina.obj’) ; 
return 

end 


$ Storage of data need by the deep learning system 


kStore =. is 
accelStore = zeros(3,1000); 
gyroStore = zeros(3,1000); 
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Figure 7.8: Data acquisition GUI. 


... Dancer Data Acquisition 
File Edit View Insert Tools Desktop Window Help = 
nöd a 01E kG 
' 
n "m 
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28 quatStore - zeros (4,1000); 
29 timeStore = zeros(1,1000); 
30 time = 0; 
31 on3D = false 
32 quitNow = false; 


34 SZ = get(0,'ScreenSize') + [99 99 -200 -200]; 


36 h.fig = figure('name','Dancer Data Acquisition','position',sZ,'units', 
pixels',... 
37 NuUnbeaD th les Orta, tag. Danes GUA color OSOS MOS 


o 


39 % Plot display 


4 gPlot.yLabel = {'\omega_x' 'Nomega y' '\omega_z' 'a x’ 'a y' 'a_z'); 
4 gPlot.tLabel = ‘Time (sec)'; 

42 gPlot.tLim = oO 

4 gPlot.pos = [05 0.88 0.46 alla 

44 gPlot.color = “oy p 

45 gPlot.width = If 

46 

47 % Calibration 

48 q0 = [AG OF SO ONE 

49 a0 = ORORO I9 


tA 
S 


140 


CHAPTER 7 M CLASSIFYING A PIROUETTE 


51 dIMU.accel aO 
52 AIMU.quat = (0). 


54 $ Initialize the GUI 
55 DrawGUI; 


The notation 
1 'Nomega x' 


is latex format. This will generate wy. 
The next part tries to find Bluetooth. It first sees if Bluetooth is available at all. It then 
enumerates all Bluetooth devices. It looks through the list to find our IMU. 


2 if( "isempty(btInfo.RemoteIDs) ) 

3 $ Display the information about the first device discovered 
4 btInfo.RemoteNames (1) 

5 btInfo.RemoteIDs (1) 

6 for iB = length(btInfo.RemoteIDs) 

e if( stremp(btInfo.RemoteNames(iB),'LPMSB2-4B31D6') ) 

8 break; 

9 end 

10 end 

11 b = Bluetooth (btInfo.RemoteIDs{iB}, 1); 

12 fopen(b); % No output allowed for some reason 

13 noIMU - false; 

14 a - fread(b,91); 

15 dIMU - DataFromIMU( a ); 

16 else 

17 warndlg('The IMU is not available.', ‘Hardware Configuration’ ) 
18 noIMU - true; 

19 end 


The following is the run loop. If no IMU is present, it synthesizes data. If the IMU is 
found, the GUI reads data from the IMU in 91 byte chunks. The uiwait is to wait until the 
user hits the start button. When used for testing, the IMU should be on the dancer. The dancer 
should remain still when the start button is pushed. It will then calibrate the IMU. Calibration 
fixes the quaternion reference and removes the gravitational acceleration. You can also hit the 
calibration button at any time. 


20 $ Wait for user input 
220 uiwait; 
2 $ The run loop 


calas Ex. 10 

24 tic 

25 while(1) 

26 if( noIMU ) 

27 omegaZ = 2xpi; 
28 aT = toc; 
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time = tame dT; 
tic 
a = omegaZx«time; 
q = [cos(a);0;0;sin(a)]; 
accel = [0;0;sin(a)]; 
omega = [0;0;omegaZ] ; 
else 
$ Query the bluetooth device 
a = fread(b, 91); 
pause(0.1); % needed so not to overload the bluetooth device 
dT = "LOS; 
time = time + dT; 
tic 
% Get a data structure 
if( length(a) > 1 ) 
dIMU - DataFromIMU( a ); 
end 
accel - dIMU.accel - a0; 
omega - dIMU.gyro; 
q = QuaternionMultiplication(q0,dIMU.quat); 
timeStore (1,kStore) = time; 
accelStore(:,kStore) = accel: 
gyroStore(:,kStore) = omega; 
quatStore(:,kStore) = lp 
kStore = kStore + 1; 
end 
dIMU - DataFromIMU( a ); 
end 
accel - dIMU.accel - a0; 
omega - dIMU.gyro; 


This code closes the GUI and displays the IMU data. 


if( quitNow ) 
close( h.fig ) 


return 
else 
ifti onim 


QuaternionVisualization( 


end 

set(h.text 

set(h.text 

gPlot - GUIPlots( 
end 

end 


(1),'string',sprintf(' 
(2),'string',sprintf(' 
set(h.text(3),'string',datestr(now)); 
IP “update”, 


update’, 


5 
5 
) 
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The drawing code uses uicontrol to create all the buttons. 


GUIPlots and 


QuaternionVisualization are also initialized. The ui control that require an action 


have callbacks. 


if( quitNow ) 
close( h.fig ) 


77 return 

78 else 

79 EOS 

80 QuaternionVisualization( ‘update’, q ); 

81 end 

82 set(h.text(1),'string', sprint£(' [%5.2£;%5.2£;%5.2f] m/s^2',accel)); 
83 set(h.text(2),'string',sprintf('[$5.2f;$5.2f;$5.2f] rad/s',omega)); 
84 set(h.text(3),'string',datestr(now)); 

85 gPlot - GUIPlots( 'update', [omega;accel], time, gPlot ); 

86 end 

87 end 

88 

8 $$ DancerGUI>DrawButtons 

90 function DrawGUI 

91 

92 $ Plots 

93 cios «ex GHAN Cabarete, Mel s Tg eed jp 


$ Quaternion display 
subplot('position',[0.05 0.5 0.4 0.4],’ 
PlotBoxAspectRatio',[1 1 1] ); 


DataAspectRatio',[1 1 1],' 


97 QuaternionVisualization( 'initialize', file, h.fig ); 

98 

99 $ Buttons 

00 f = ('Acceleration', 'Angular Rates’ 'Time'); 

01 n - length(f); 

02 p = get(h.fig,'position'); 

03 dY = p(4)/20; 

04 yH - p(4)/21; 

05 y = (oi 

06 x Ex (orbs 

07 wX = p(3)/6; 

08 

09 $ Create pushbuttons and defaults 

10 FOr k= sls: 

11 h.pushbutton(k) = uicontrol ( tg stylet aGtexti str noto 
positiona, px y wX yH]); 

12 h.text(k) - uicontrol( h.fig,'style','text','string','', g 
position’, [x+wX y 2*wX yH]); 

13 y 07 te lol's 

14 end 
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uicontrol takes parameter pairs, except for the first argument that can be a figure handle. 
There are lot of parameter pairs. The easiest way to explore them is to type 


hy = uicontrols 
get (h) 


All types of uicontrol that handle user interaction have ‘‘callbacks” that are func- 
tions that do something when the button is pushed or menu item is selected. We have five 
uicontrol with callbacks. The first uses uiwait and uiresume to start and stop data 
collection. 


3 3 Start/Stop button callback 
4 function StartStop(hObject, ^, ^ ) 
5 if( hObject.Value ) 
6 uiresume; 
7 else 
8 SaveFile; 
9 uiwait 
10 end 
11 end 


The second uses questdlg to ask if you want to save the data that has been stored in the 
GUI. This produces the modal dialog shown in Figure 7.9. 


$ Quit button callback 
function Quit(^, ^, ^) 

button = questdlgi( Save Data?’ EXE Dialog, Yes " No, No); 

switch button 

Case 'Yes' 
% Save data 

case ‘No’ 

end 


c 0 a O t OC t 


SN] 
© 


quitNow = true; 
uiresume 
end 


N N 
N = 


Figure 7.9: Modal dialog. 


e^ Exit Dialog 
"T Save Data? 
Yes No 
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The third, Clear, clears the data storage arrays. It resets the quaternion to a unit quater- 
nion. 


23 % Clear button callback 

24 function Clear(^, ^, ^ ) 

25 kStore E 

26 accelStore = zeros(3,1000); 
27 gyroStore = zeros(3,1000); 
28 quatStore = zeros(4,1000); 
29 timeStore = zeros(1,1000); 
30 time = (Us 


31 end 


The fourth, calibrate, runs the calibration procedure. 


32 % Calibrate button callback 


33 function Calibrate(~, ~, ~ ) 

34 a = fread(b,91); 

35 dIMU - DataFromIMU( a ); 

36 a0 - dIMU.accel; 

37 qo = AIMU.quat; 

38 QuaternionVisualization( ‘update’, q0 ) 


39 end 


The fifth, SaveFile, saves the recorded data into a mat file for use by the deep learning 
algorithm. 


40 % Save button call back 


41 function SaveFile(~,~,7) 

42 cd TestData 

43 fileName = get (h.matFile, 'string/'); 
44 s = dir; 

45 n = length(s); 

46 fNames = cell(1,n-2); 

47 for Kh 3c 

48 fNames{kF-2} = s(kF) .name (1:end-4); 
49 end 

50 j = contains (fNames, fileName) ; 

51 105—207. 

52 if( "isempty(j) ) 

53 for kF = 1:length(j) 

54 if( j(kP)) 

55 f = £Names{kF}; 

56 SS (Ge 4) 9 ap 

57 m = str2double(f(i-1:end)); 
58 end 

59 end 


60 end 
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We make it easier for the user to save files by reading the directory and adding a number to 
the end of the dancer filename that is one greater than the last filename number. 


7.8 Making the IMU Belt 


7.8.1 Problem 
We need to attach the IMU to our dancer. 


7.8.2 Solution 


We use the arm strap that is available from the manufacturer. We buy an elastic belt and make 
one that fits around the dancer’s waist. 


7.8.3 How It Works 


Yes, software engineers need to sew. Figure 7.10 shows the process. The two products used to 
make the data acquisition belt are 


1. LPMS-B2 Holder (available from Life Performance Research) 


2. Men’s No Show Elastic Stretch Belt Invisible Casual Web Belt Quick Release Flat Plastic 
Buckle (available from Amazon) 


Remove the holder from the LPMS-B2 Holder. Cut the belt at the buckle and slide the 
holder onto the belt. Sew the belt at the buckle. 

The sensor on a dancer is shown in Figure 7.11. We had the dancer stand near the laptop 
during startup. We didn’t have any range problems during the experiments. We didn’t try it 
with across the floor movement as one would have during grande allegro. 


Figure 7.10: Elastic belt manufacturing. We use the two items on the left to make the one on 
the right. 
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Figure 7.11: Dancer with the sensor belt. The blue light means it is collecting data. 


7.9 Testing the System 
7.9.1 Problem 


We want to test the data acquisition system. This will find any problems with the data acquisi- 
tion process. 


7.9.2 Solution 


Have a dancer do changements, which are small jumps changing the foot position on landing. 


7.9.3 How It Works 


The dancer puts on the sensor belt, we push the calibrate button, then she does a series of 
changements. She stands about 2 m from your computer to make acquisition easier. The dancer 
will do small jumps, known as changements. A changement is a small jump where the feet 
change positions starting from fifth position. If the right foot is in fifth position front at the 
start, it is in the back at the finish. Photos are shown in Figure 7.12. 

The time scale is a bit long. You can see that the calibration does not lead us to a natural 
orientation in the axis system in the GUI. It doesn’t matter from a data collection point of view 
but is an improvement we should make in the future. The changement is shown in Figure 7.13. 
The dancer is still at the beginning and end. 

The interface to the bluetooth device doesn’t do any checking or stream control. Some blue- 
tooth data collection errors occur from time to time. Typically, they happen after 40 seconds of 
data collection. 
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Figure 7.12: Dancer doing a changement. Notice the feet when she is preparing to jump. The 
second image shows her feet halfway through the jump. 


Figure 7.13: Data collected during the changement. 


ere Dancer Data Acquisition 
File Edit View insert Tools Desktop Window Help v) 


Dude 308055 


ark x 
VA y 
par M 
"WV Ww 
al MY 
o 10 E x = w © m ~ wo w 
art A 
4 E 
oe ee “WN Wl 
ot Wy 
T a 
o 10 E] x = w *6 » e w 00 
o 
a T E, 
da y Y 
A 
E Mm) 
m "V V 
m Y 
o o E E re Pry “o » sw *» w 
s as 
T h 
D 
= | lla n 
ž a m n) "MI | 
Ji 
o o 2 x» = w * m = * 0 
ad Myo me 
9" | 
1 M 
son > Lf | 
sat M | 
Caren 
o 0 26 x = P © » E so 00 
Qu 
zm Jl ' 
i NY 
Cor Dua od ATL 
"m "qr 
Snup o o 2 x LI LJ LJ LJ .- LJ '00 
T ect 
Tme 08-A49:2019 112524 
mosa Pate 10.04; 0.14: 0.06 may 
 Auseraion 10.00:2.01;021] mw? 


148 


CHAPTER 7 M CLASSIFYING A PIROUETTE 


index exceeds the number of array elements (0). 
Error in instrhwinfo>bluetoothCombinedDevices (line 976) 
uniqueBTName = allBTName (uniqueRowOrder) ; 
Error in instrhwinfo (line 206) 
tempOut = bluetoothCombinedDevices (tempOut) ; 
Error in DataAcquisition (line 13) 
btInfo = instrhwinfo('Bluetooth'); 


If this happens, turn the device on and off. Restart MATLAB if that doesn't work. 
Another bluetooth error is 


ans - 
1x1 cell array 
(' LPMSB2-4B31D6'] 
ans - 
1x1 cell array 
('btspp://00043E4B31D6' } 
Error using Bluetooth (line 104) 
Cannot Create: Java exception occurred: 
java.lang.NullPointerException 
at com.mathworks.toolbox.instrument.BluetoothDiscovery. 
searchDevice (BluetoothDiscovery.java:395) 
at com.mathworks.toolbox.instrument.BluetoothDiscovery. 
discoverServices (BluetoothDiscovery.java:425) 
at com.mathworks.toolbox.instrument.BluetoothDiscovery. 
hardwareInfo(BluetoothDiscovery.java:343) 
at com.mathworks.toolbox.instrument.Bluetooth.«init» (Bluetooth. 
java:205). 


This is à MATLAB error and requires restarting MATLAB. It doesn't happen very often. We 
ran the entire data collection with four dancers doing ten pirouettes each without ever experi- 
encing the problem. 


7.10 Classifying the Pirouette 
7.10.1 Problem 


We want to classify the pirouettes of our four dancers. 


7.10.2 Solution 


Create an LSTM that classifies pirouettes according to dancer. The four labels are the dancer 
names. 
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Figure 7.14: A pirouette. Angular rate and linear acceleration are shown. 
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7.10.3 How It Works 


The script takes one file and displays it. Figure 7.14 shows a double pirouette 
only a few seconds. 


dancer = ('Ryoko' 'Shaye', 'Emily', ‘Matanya’}; 


1 
2 

3 $$ Show one dancer's data 

4 cd TestData 

5 8 = load('Ryoko 10.mat'); 

6 yL = [('Nomega x' 'Nomega y' '\omega_z’ 'a x’ 'a y' ‘a z'); 

JEEP cUSet stimulate 5 ems e elos?) mo saab edis al, aras bala 


title' dancer(1)); 


. A turn takes 


We load in the data and limit the range to 6 seconds. Sometimes the IMU would run longer 


due to human error. We also remove sets that are bad. 


DancerNN.m 
1 $$ Load in and process the data 
2 n= O 
3 % Get the data and remove bad data sets 
Ak = 10) 
5 for k = 1:length(dancer) 
6 Eor j & algal) 
7 S = load(sprintf('$s $d.mat',dancer(k],j)); 
8 cS = size(s.state,2); 
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9 xB ooh ES S A 

0 a, Ex Jal, ae dls 

1 d{i,1} = s.state; %#ok<*SAGROW> 
2 ele ele) SE tia 

13 ciao) = ee 

4 end 

15 end 

6 end 

" 


is fprintf('$d remaining data sets out of $d total.\n’,i,n) 


Jy *h4ebc- decere 


21 j = length (find (c==k)); 

22 fprintf ('%7s data sets d\n’ ,dancer{k},j) 
23 end 

24 

2 "dei ew aug 

26 

27 cd 


29 % Limit the range to 6 seconds 
30) S cRanges-067 


SILO = dga 

32 j = find(t{i} - t{i,1} > tRange ); 
33 if( "isempty(j) ) 

34 d(i)(:,j(1)«1:end)- []; 

35 end 

36 end 


We then train the neural network. We use a bidirectional LSTM to classify the sequences. 
There are ten features, four quaternion measurements, three rate gyro, and three accelerometer. 
The four quaternion numbers are coupled through the relationship 


l=d4 +0 tht (7.19) 


However, this should not impact the learning accuracy aside from slowing down the learning. 

We then load in and process the data. Some data sets didn’t have any data and need to be 
removed. We also limit the range to 6 seconds since sometimes the data collection did not stop 
after the pirouette ended. 


3% Load in and process the data 


1 
AO 

3 % Get the data and remove bad data sets 

A. 3g OA 

5 for k - 1:length (dancer) 

6 for a) subo 

7 S = load(sprintf('$s $d.mat',dancer(k],j)); 
8 cS = size(s.state,2); 

9 


euet cw y 
i = ab uw» ¿Lp 


5 
p 
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1 d{i,1} = s.state; %#ok<*SAGROW> 

2 E = tye Eme 

3 Gat aL) = hen 

4 end 

5 end 

6 end 

7 

8 fprintf('$d remaining data sets out of %d total.\n’,i,n) 
9 


fork =" 14 

j = length (find (c==k) ) ; 

fprintf('$7s data sets $dWn',dancer(k],j) 
end 


boy dO NN NY y» 
An FON SS 
ll 
p 


cd 


Y N 
o 3 


o 


$ Limit the range to 6 seconds 
tRange = 6; 


W N 
o vo 


ak os ale isa 

32 j = find(t{i} - t{i,1} > tRange ); 
33 if( "isempty(j) ) 

34 d(i)(:,j(1)«1:end)- []; 

35 end 

36 end 


w w 
o 


$$ Set up the network 
33 numFeatures - 10; % 4 quaternion, 3 rate gyros, 3 accelerometers 


40 numHiddenUnits - 400; 

41 numClasses = 4; % Four dancers 

42 

4 layers = [ 

44 sequenceInputLayer (numFeatures) 
45 bilstmLayer (numHiddenUnits, ‘OutputMode’ ,’last’ ) 
46 fullyConnectedLayer (numClasses) 
47 softmaxLayer 

48 classificationLayer] ; 

49 disp (layers) 

50 

51 options = trainingOptions('adam', 
52 'MaxEpochs',60, 

53 'GradientThreshold',1, 

54 'Verbose',0, 

55 'Plots','training-progress'); 


We then train the neural network. We use a bidirectional LSTM to classify the sequences. 
This is a good choice because we have access to the full sequence. For a classifier using 
bilstmLayer, we must set the ‘outputMode’ to ‘last’. This is followed by a fully 
connected layer, a Softmax for producing normalized maximums, and finally the classification 
layer. 
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56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
72 


3% Train the network 


nTradin = 30; 

kTrain = randperm(n,nTrain) ; 

xTrain = d(kTrain); 

yTrain = categorical (c(kTrain) ) ; 

net = trainNetwork(xTrain,yTrain, layers, options); 


2% Test the network 


kTest = getdiff(1:n,kTrain); 
xTest = d(kTest); 

yTest = categorical(c(kTest)); 
yPred = classify(net,xTest) ; 


$ Calculate the classification accuracy of the predictions. 
acc - sum(yPred -- yTest)./numel(yTest); 


disp('Accuracy') 
disp(acc) ; 


>> DancerNN 
36 remaining data sets out of 40 total 
Ryoko data sets 6 
Shaye data sets 10 
Emily data sets 10 
Matanya data sets 10 
5x1 Layer array with layers: 


Ex Sequence Input 

p: BiLSTM 

Fully Connected 

E Softmax 

a Classification Output 


OF WN EH 


Sequence input with 10 dimensions 
BiLSTM with 400 hidden units 

4 fully connected layer 

softmax 

crossentropyex 


The training GUI is shown in Figure 7.15. It converges fairly well. 
We test neural network against the unused data. 


14 kTrain = randperm(n,nTrain) ; 

15 xTrain = d(kTrain); 

16 yTrain = categorical (c(kTrain) ) ; 

5 net = trainNetwork(xTrain, yTrain 


19 %% Test the network 


20 kTest = getdiff(1:n,kTrain); 
21 xTest = d(kTest) ; 
22 yTest - categorical(c(kTest)); 
23 yPred = Classify(net,xTest) ; 
Accuracy 
0.8333 
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Figure 7.15: Neural net training. 


... Training Progress (23-Aug-2019 13:58:52) 
Training Progress (23-Aug-2019 13:58:52) 


Acc 


Table 7.3: Hardware. 


Supplier 
LP-Research Inc. | LPMS-B2: 9-Axis Inertial Measurement Unit $299.00 
IMU Holder | LP-Research Inc. | LPMS-B2: Holder $30.00 


Belt Amazon Men’s Elastic Stretch Belt Invisible Casual Trousers $10.99 
Webbing Belt Plastic Buckle Black Fits 24” to 42” 


The result, > 80%, is pretty good considering the limited amount of data. Four Ryoko sets 
were lost due to errors in data collection. It is interesting that the deep learning network could 
distinguish the dancers” pirouettes. The data itself did not show any easy-to-spot differences. 
Calibration could have been done better to make the data more consistent between dancers. It 
would have been interesting to collect data on multiple days. Other experiments would be to 
classify pirouettes done in pointe shoes and without. We might also have had the dancers do 
different types of turns to see 1f the network could still identify the dancer. 


7.11 Hardware Sources 


Table 7.3 gives the hardware used in this chapter along with the prices (in US dollars) at the 
time of publication. 
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Completing Sentences 


8.1 Introduction 


8.1.1 Sentence Completion 


Completing sentences is a useful feature for text entry systems. Given a set of possible sen- 
tences, we want the system to predict a missing part of the sentence. We will use the Research 
Sentence Completion Challenge [31]. It is a database of 1040 sentences each of which has four 
imposter sentences and one correct sentence. Each imposter sentence differs from the correct 
sentence by one word in a fixed position. The deep learning system should identify the correct 
word in the sentence. Imposter words have similar occurrence statistics. The sentences were 
selected from Sherlock Holmes novels. The imposter words were generated using a language 
model trained using over five hundred nineteenth-century novels. Thirty alternative words for 
the correct word were produced. Human judges picked the four best imposter words from the 
30 alternatives. The database can be downloaded from Google Drive [20]. 

The first question in the database and the five answers, including the four imposters, are 
given as follows. 


I have it from the same source that you are both an orphan and a bachelor 
and are alone in London. 
a) crying b) instantaneously c) residing d) matched e) walking 


(b) and (d) don’t fit grammatically. (a) and (e) are incompatible with the beginning in 
which the speaker is recounting general information about the subject’s state. (c) makes the 
most sense. If after *“are” we had ““often seen,” then (a) and (e) would be possibilities and 
(c) would no longer make sense. You would need additional information to determine if (a) or 
(e) were correct. 
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8.1.2 Grammar 


Grammar is important in interpreting sentences. The structure of a language, that is, its gram- 
mar, is very important. Since all of our readers don’t speak English as their primary language, 
we'll give some examples in other languages. 

In Russian, the word order is not fixed. You can always figure out the words, and whether 
they are adjectives, verbs, nouns, and so forth from the declension and conjugation, but the 
word order is important as it determines the emphasis. For example, to say, *'I am an engineer” 
in Russian: 


A uHxXxeHep 
We could reverse the order: 
uHxeHep Y 


which would mean the emphasis is on “engineer” not *T”. While it is easy to know that 
the sentence is stating that “‘I am an engineer,” we don't necessarily know how the speaker 
feels about it. This may not be important in rote translation but certainly makes a difference in 
literature. 

Japanese is known as a subject-object-verb language. In Japanese the verb is at the end of 
the sentence. Japanese also makes use of particles to denote word function such as subject or 
object. For the sentence completion problem, the particle would denote the function of the word. 
The rest of the sentence would determine what the word may mean. Here are some particles: 


IX ‘‘wa/ha” indicates the topic, which could be the object or subject. 
Ze ‘‘wo/o” indicates the object. 
b***ga” indicates the subject. 


For example, in Japanese 
NIT VST OF 


or “‘watashi wa enjinia desu” 

means, ‘‘I am an engineer.” |X is the topic marker pointing to ‘T’. €T is the verb. We'd 
need other sentences to predict the Rh, T’, or “engineer”. 

Japanese also has the feature where everything, except the verb, can be omitted. 


VERAS 


or *'I ta da ki ma su.” This means *'I will eat” whatever is given. You need to know the 
context or have other sentences to understand what is meant by the sentence. 

In addition, in Japanese, many different Kanji, or symbols, can mean approximately the 
same thing, but the emphasis will be different. Other Kanji have different meanings depending 
on context. Japanese also does not have any spaces between words. You just have to know 
when a kana character, like 14, is part of the preceding Kanji. 
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8.1.3 Sentence Completion by Pattern Recognition 


Our approach is sentence completion by pattern recognition. Given a database of your sen- 
tences, the pattern recognition algorithm should be able to recognize the patterns you use and 
find errors. Also, in most languages, dialogs between people use far fewer words and simpler 
structures than the written languages. You will notice this if you watch a movie in a foreign 
language for which you have a passable knowledge. You can recognize a lot more than you 
would expect. Russian is an extreme in this regard; it is very hard to build vocabulary from 
reading because the language is so complex. Many Russian teachers teach the root system so 
that you can guess word meaning without constantly referring to a dictionary. Using word roots 
and sentence structure to guess words is a form of sentence completion. We'll leave that to our 
Russian readers. 


8.1.4 Sentence Generation 


As an aside, sentence completion leads to generative deep learning [12]. In generative deep 
learning, the neural network learns patterns and then can create new material. For example, a 
deep learning network might learn how a newspaper article is written and be able to generate 
new articles given basic facts the article is supposed to present. This is not a whole lot different 
than when writers are paid to write new books in a series such as Tom Swift or Nancy Drew. 
Presumably, the writer adds his or her personality to the story, but perhaps a reader, who just 
wants a page turner, wouldn't really care. 


8.2 Generating a Database of Sentences 
8.2.1 Problem 


We want to create a set of sentences accessible from MATLAB. 


8.2.2 Solution 


Read in the sentences from the database. Write a function to read in tab-separated text. 


8.2.3 How It Works 


The database that we downloaded from Google Drive was an Excel csv file. We need to first 
open the file and save it as tab-delimited text. Once this is done, you are ready to read it into 
MATLAB. We do this for both test_answer.csv and testing.data.csv. We man- 
ually removed the first column in test.answer.csv in Excel because it was not needed. 
Only the txt files that we generated are needed in this book. 

If you have the Statistics and Machine Learning Toolbox, you could use tdfread. We'll 
write the equivalent. There are three outputs shown in the header. They are the sentences, the 
range of characters where the word needed for completion fits, the five possible words and the 
answer. 

We open the file using £ = £open('testing data.txt','r');. This tells it that 
the file is a text file. We search for tabs and add the end of the line so that we can find the last 
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word. The second reads in the test answers and converts them from a character to a number. 
We removed all extraneous quotes from the text file with a text editor. 


ReadDatabase.m 


f = fopen('testing data.txt','r'); 


$ We know the size of the file simplifying the code. 
- zeros(1040,2); 

- zeros(1040,1); 

- strings(1040,1); 

strings(1040,5); 

= sprintf£(’\t’); 

= 1; 


V 0 30 OQ tU PF WN — 


"oatdunmvc 
ll 


2 


$ Read in the sentences and words 
13 while(^feof£(f)) 


14 q - fgetl(f); $ This is one line of text 

15 j - [strfind(q,t) length(q)+11; % This finds tabs that delimit words 
16 S (k) = convertCharsToStrings(q(j(1)«1:j (2)-1)); $ Convert to strings 

17 EOL RISE 

18 v(k,i) = convertCharsToStrings (q (3 (1+1)+1:3] (1+2)-1)); % Make strings 
19 end 

20 ul = strfind(s(k),' '); % Find the space where the answers go 


B boat (aby) ul(end)]; exGertecheerangegorecharactensesomnechexamswer 
22 k EX seu ale 


pss 

[es 

= 
l 


25 fclose(f); 


27 % Read in the test answers 


28 f = fopen('test answer.txt','r'); 
29 

30 eds 

3 while("feof(f)) 

32 q = fgetl(f); 

33 a(k,1) = double (q) -96; 
34 k E Soe oat 

35 end 


3  fclose(f); 


If we run the function, we get the following outputs. 


>> [s,u,v,a] = ReadDatabase; 
>> s(1) 
ans = 
"I have it from the same source that you are both an orphan and a 
bachelor and are alone in London." 
SUME: D) 
ans - 


1x5 string array 
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"cerydng,t "instantaneously" "residing" "matched" "walking" 
>> a(1) 
ans - 

3 


All outputs (except for the answer number) are strings. convertCharsToStrings does 
the conversion. Now that we have all of the data in MATLAB, we are ready to try and train the 
system to determine the best word for each sentence. As an intermediate step, we will convert 
the words to numbers. 


8.3 Creating a Numeric Dictionary 
8.3.1 Problem 


We want to create a numeric dictionary to speed neural net training. This eliminates the need 
for string matching during the training process. Expressing a sentence as a numeric sequence as 
opposed to a sequence of character arrays (words) essentially gives us a more efficient way to 
represent the sentence. This will become useful later when we perform machine learning over 
a database of sentences to learn valid and invalid sequences. 


8.3.2 Solution 
Write a MATLAB function, DistinctWords, to search through text and find unique words. 


8.3.3 How It Works 


The function removes punctuation using erase in the following lines of code. 


DistinctWords.m 


oe 


Remove punctuation 


1 
AS SS AN 
BW SE Nr 
4 w = erase(w,'.'); 


It then uses split to break up the string and finds unique strings using unique. 


5 % Find unique words 
E = Ema, 
7 d = unique (s); 


This is the built-in demo. It finds 38 unique words. 


>> DistinctWords 
WwW = 
"No one knew it then, but she was being held under a type of house 
arrest while the tax authorities scoured 
the records of her long and lucrative career as an actress, a 
luminary of the red carpet, a face of luxury 
brands and a successful businesswoman." 
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d= 
1x38 string array 
Columns 1 through 12 
"No" "one" "knew" Usd "then" Wo uit! "she" "was" 
"being" "held" "under" "type" 
Columns 13 through 22 
"house" "arrest" "while" "tax" "authorities" "scoured" 
"records" neral "long" "lucrative" 
Columns 23 through 33 
We oec Wasi "an" "actress "luminary" uenen red 
"carpet" "face" WOOF "luxury" 
Columns 34 through 38 
"brands" "and" "a "Successful" "businesswoman" 
-S 
Columns 1 through 20 
alt 2 3 4 5 6 y 8 9 EO) ZEE 36 
db» 92 LS 14 115) 28 16 iy) 
Columns 21 through 40 
18 28 abs) 32 20 Bal 35 2 23 24 25 26 
36 2 32 28 219 30 36 SL 
Columns 41 through 47 
32 33 34 25) 36 37 38 


d is a string array and maps onto array n. 


8.4 Map Sentences to Numbers 
8.4.1 Problem 


We want to map sentences to unique numbers. 


8.4.2 Solution 


Write a MATLAB function to search through text and assign a unique number to each word. 


8.4.3 How It Works 


The function splits the string and searches using d. The last line removes any words (in this 


case, only punctuation) that are not in the dictionary. 


MapToNumbers.m 
1 function n = MapToNumbers( w, d ) 
2 
3 % Demo 
4 if( nargin « 1 ) 
5 Demo; 
6 return 
7 end 
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WERO ASS Wa) es 
10 w = erase(w,','); 
WAS SS WI 
2 S = split(w)'; % string array 


14 n = zeros(1,length(s) ); 
15 for k = 1:length(s) 


16 ids = find(stremp(s(k),d)); 
17 if ~isempty (ids) 

18 n(k) = ids; 

19 end 

20 

21 end 


This is the built-in demo. 


>> MapToNumbers 


WwW = 
"No one knew it then, but she was being held under a type of house 
arrest while the tax authorities scoured the records of her long 
and lucrative career as an actress, a luminary of the red carpet, 
a face of luxury brands and a successful businesswoman." 
FDEZ 
Columns 1 through 19 
dl 2 3 4 0 6 4 8 9 10 dat 36 
12 Be 123) 14 ES 28 16 
Columns 20 through 38 
asy 18 28 15) 22 20 2 35 22 23 24 25 
0 36 24 92 28 29 0 
Columns 39 through 46 
36 3i 92 383 34 35) 36 317 


8.5 Converting the Sentences 
8.5.1 Problem 


We want to convert the sentences to numeric sequences. 


8.5.2 Solution 


Write a MATLAB function to take each sentence, add the words, and create a sequence. Each 
sentence is classified as correct or incorrect. 
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8.5.3 How It Works 


The script reads in the database. It creates a numeric dictionary for all of the sentences and then 
converts them to numbers. This first part creates 5200 sentences. Each is classified as correct 
or incorrect. Note how we initialize a string array. 


PrepareSequences.m 


% See also 


p 
$ 
p 

$ 


1 
2 ReadDatabase, extractBefore, extractAfter, MapToNumbers 
3 
4 [s,u,v,a] = ReadDatabase; 

5 

6 % Whatever you want in the training 
7 nSentences = 100; $length(s); 

8 

bx som db 

0 = zeros(size(v,2)*nSentences, 1); 

1 z = strings (size(v,2)*nSentences,1); 
2 for k = 1:nSentences 

3 qi - extractBefore(s(k),u(k,1)); 
4 q2 = extractAfter(s(k),u(k,2)); 
5 for j = 1:size(v,2) 

6 A = Cpl) ae Ai ap) Sb ope 

7 abe a SS asl M) 

8 CA = ake 

9 else 

20 em. es 

21 end 

22 abes ab we be 

23 end 

24 end 


The next section concatenates all of the sentences into a gigantic string and creates a 
dictionary. 


25 $$ Create a numeric dictionary 


pa Ag Ex zd 3 

27 for k = 2:length(z) 

28 r=xr+ " " + zl(k); 3 append all the sentences to one string 
29 end 

30 

31 d = DistinctWords( r ); % find the distinct words 


The final part creates the numeric sentences and saves them. The loop that prints the lines 
shows a handy way of printing an array using fprintf. 


32 nZ{k} = MapToNumbers( z(k), d ); 
33 end 

34 

35 % Print 2 sentences 

SOLO O 
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37 fprintf ('Category: $d',c(k)); 
38 fprintf('$5d',nZ[k]) 

39 fprintf('Wn') 

40 llos a) == = Onn) 

41 fprintf(’\n’) 

42 end 

4 end 


44 
4s $$ Save the numbers and category in a mat-file 
46 

4 save('Sentences','nZ', 'c'); 


As expected, only one word is different in each set of five sentences. 


>> PrepareSequences 


Category: 0 LS sus. al Bis) bois) (S; iri be) Hu) T 37/0) 8 546 544 
CAS ako) ZE Z A OA: 
Category: 0 LAS als SL Esas dos) ia las)? alo) TO 8 546 544 
ERAS. Lo) 3205152252507. 
Category: 1 1042875381 5411553103 67535 Later lito) TE LAO 8 546 544 
El yaks O A204. 
Category: 0 aL byes Isela Saa isis LOs) Paso desp alxe) XP ITA) 8 546 544 
9546 EN) Be 24104 
Category: 0 LAS SES SAL Sa LOS) CASO abo) dh EYO 8 546 544 
IDA Os Sas EZ AO A. 
Category: 1 323 EAS SAS 5424 SS ES SN 2055490 32] 22 
14 404 24 25 546 
Category: DA 451 AS US AG 5s AA SA eb 6: SOS Ia AO NS D] 
15 404 24 25 546 
category- 107.323 0484020370 OA Oo 5d AUS 546 OA Sa ALO AO 23] 722 
16 404 24 25 546 
Category- 05323 4801 07378 9465945424 8546, 904265 544 ELO 9 7211522 
dU iub E. iss ANS) 
Category: Omo 4:0: 9993570 EE TIOW AG 554 T MENT OM5TONELORTGOE5QbATA7R (0M ZO DX E27 
23 404 24 25 546 


8.6 Training and Testing 
8.6.1 Problem 


We want to build a deep learning system to complete sentences. The idea is that the full database 
of correct and incorrect sentences provides enough information for the neural net to deduce the 
grammar and meaning. 
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8.6.2 Solution 


Write a MATLAB script to implement an LSTM to classify the sentences as correct or incorrect. 
The LSTM will be trained with complete sentences. No information about the words, such as 
whether a word is noun, verb, or adjective, nor of any grammatical structure will be used. 


8.6.3 How It Works 


We will produce the simplest possible design. It will read in sentences, classified as correct 
or incorrect, and attempt to determine if new sentences are correct or incorrect just from the 
learned patterns. This is a very simple and crude approach. We aren’t taking advantage of our 
knowledge of grammar, word types (verb, noun, etc.), or context to help with the predictions. 
Language modeling is a huge field, and we are not using any results from that body of work. Of 
course, applications of all of the rules of grammar don’t necessarily ensure success; otherwise 
there would be more 800s on the SAT verbal tests. 
We use the same code to make sure the sequences are valid. 


SentenceCompletionNN.m 


$$? Load the data 


1 
2 s = load('Sentences'); 

3 n = length(s.c); $ number of sentences 

4 

5 $ Make sure the sequences are valid. One in every 5 is complete. 
6 fork = 1:10 

7 fprintf ('Category: $d',s.c(k)); 

8 fprintf ('%5d’,s.nz{k}) 

9 fprint£(’\n’) 

10 EMO AA 0) 

11 fprintf ('\n'’) 

12 end 

13 end 


5 $$ Set up the network 


Each set has one correct sentence. The remainder have the wrong answers. 


>> SentenceCompletionNN 
Category: 0 LAS OS OL Sa LO) Saa leve). O) TE IRTA) 8 546 544 


9 546 10 2 12 404 

Category: 0 ASS mak sre cabtgre) [SC SS ds O) 7 170 8 546 544 
9542619 3 12 404 

Category: 1 Aly Patsy ES 54:198 5:5 LOS) (S. Bie alas): o 7 170 8 546 544 
9 546 10 4 12 404 

Category: 0 ASS tal ss S SS dao) abo) CITO 8 546 544 
SNS ALO 5 12 404 

Category: 0 TALADROS AD SOS S mes das ib) y O 8 546 544 
95542 6 09 Iq a2 404 


Category 19929 40] 379 ORT 655442 X MIB ETG ESI) A 6 5 lA 2 O0MA29 2a 22 
qM 0105247025546 
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Category. Om rA 1 ES SORA O Dl AE SID ACI ANS) yea O sya) 2 
MERA AF 24 25S 5G 

Category: 10) 3237 487 3:78 9 4655544. Aes 54165 OF AG5SbA4 2055497 423] 22 
TERA DA 025 546 

Category: 0m 23 O OS AUS SEO AGS 54 A205 LA 
17 404 24 25 546 

(csi fobeyes 10) EE 7461 379 ES A 6 bab AAS SAS. MS) AOS Ba AO SAS E 22 
23 404 24 25 546 


Because we have access to full sequences at prediction time, we use a bidirectional LSTM 
layer in the network. A bidirectional LSTM layer learns from the full sequence at each time 
step. The training code follows. We convert the classes, O and 1, to a categorical variable. 


13 numFeatures = 1; 
14 numHiddenUnits = 400; 
15 numClasses = 2; 


1 layers = [ 

18 sequenceInputLayer (numFeatures) 

19 bilstmLayer (numHiddenUnits, 'OutputMode','last') 
20 fullyConnectedLayer (numClasses) 

21 softmaxLayer 

22 classificationLayer]; 


24 disp(layers) 


20 options - trainingOptions('adam', 


27 'MaxEpochs',60, 

28 'MiniBatchSize',20,... 

29 'GradientThreshold',1, 

30 'SequenceLength','longest', 
31 Ion ut les ee 

32 'Verbose',1, 

33 'InitialLearnRate',0.01,... 
34 'Plots','training-progress'); 
35 $ ‘SequenceLength’,’longest’, 


So 


3 $$ Train the network - Uniform set 


38 nSentences = n/5; $ number of complete sentences in the database 

ao nfxrain = floor(0.75*nSentences) ; % use 75% for training 

40 xTrain = s.nZ(1:5*nTrain); % sentence indices, in order 
4 yTrain - categorical(s.c(1:5«nTrain)); $ complete or not? 


4 net = trainNetwork(xTrain,yTrain,layers,options); 


The output is 


5x1 Layer array with layers: 


dl Pu Sequence Input Sequence input with 1 dimensions 
2 ut BiLSTM BiLSTM with 400 hidden units 
3 ee Fully Connected 2 fully connected layer 
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4 2 Softmax softmax 

5 ud Classification Output crossentropyex 
Uniform set 

0.8000 
Random set 

0.6176 


The first layer says the input is a one-dimensional sequence. The second is the bidirectional 
LSTM. The next layer is a fully connected layer of neurons. This is followed by a Softmax 
layer and then by the classification layer. The standard Softmax is 


e& 


— r (8.1) 
Pu d e^ 


Ok = 


which is essentially a normalized output. 
The testing code is 


2 $$ Train the network - Uniform set 

3 nSentences - n/5; % number of complete sentences in the database 

4 nTrain = floor(0.75*nSentences) ; % use 75% for training 

5 xTrain = g.nZ(1:5«nTrain); % sentence indices, in 
order 

6 yTrain = categorical(s.c(1:5«nTrain)); % complete or not? 

7 net = trainNetwork(xTrain,yTrain, layers, options); 

8 

9 $ Test this network - 80% accuracy 

20 xTest = g.nZ(5xnTrain-«1:end); 

20 yTest = categorical(s.c(5«nTrain«1:end)); 

22 yPred = Classify(net,xTest) ; 


23 
21 $ Calculate the classification accuracy of the predictions. 
25 acc - gum(yPred -- yTest)./numel(yTest); 

26 disp('Uniform set’) 

27 disp(acc); 

28 

2 $$ Train the network using randomly selected sentences 


30 kTrain = randperm(n,5«nTrain); % nTrain (30!) integers in range 1:n 
31 xTrain = s.nZ(kTrain); 

3 yTrain = categorical(s.c(kTrain)); 

33 net = trainNetwork (xTrain,yTrain,layers,options); 


34 
35 % Test the network 

36 kTest - setdiff(1:n,kTrain); 

37 xTest - s.nZ(kTest); 

38 yTest = categorical(s.c(kTest)); 

39 yPred = classify (net,xTest); 

40 

4 % Calculate the classification accuracy of the predictions. 
4 acc = sum(yPred == yTest)./numel(yTest); 
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43 
44 disp(’Random set’) 
45 disp(acc) ; 


Figure 8.1 shows learning with a uniform set. All five sentences are input; only one has 
the correct answer. The learning system quickly figures out that it can get 80% accuracy by 
classifying each sentence as wrong. 

Figure 8.2 shows learning with a random set. The sentences are drawn in random order 
from the entire database. Training now reaches 93% accuracy. If you run it multiple times, you 
will see that the results vary. 

Figure 8.3 shows a second run. The random sets produce networks with lower probabilities 
of success, but at least the network is trying to find correct sentences. This particular approach 
is very simple. Nonetheless, it shows the potential for working with text and ultimately under- 
standing written language. 


Figure 8.1: Learning with a uniform input of sentences. 


eee Training Progress (20-Aug-2019 23:28:12) 
Training Progress (20-Aug-2019 23:28:12) 


Acc 
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Figure 8.2: Learning with a random set of sentences. 
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Figure 8.3: Learning with a different random set of sentences. This reaches 95% accuracy with 


the training set. 
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CHAPTER 9 
EEE 


Terrain-Based Navigation 


9.1 Introduction 


Prior to the widespread availability of GPS, Loran, and other electronic navigation aids, pilots 
used visual cues from terrain to navigate. Now everyone uses GPS. We want to return to the 
good old days of terrain-based navigation. We will design a system that will be able to match 
terrain with a database. It will then use that information to determine where it is flying. 


9.2 Modeling Our Aircraft 
9.2.1 Problem 


We want a three-dimensional aircraft model that can change direction. 


9.2.2 Solution 


Write the equations of motion for three-dimensional flight. 


9.2.3 How It Works 


The motion of a point mass through three-dimensional space has 3 degrees of freedom. Our 
aircraft model is therefore given 3 degrees of spatial freedom. The velocity vector is expressed 
as a wind-relative magnitude (V) with directional components for heading (4) and flight path 
angle (y). The position is a direct integral of the velocity, and is expressed in y = North, x = 
East, h = Vertical coordinates. In addition, the engine thrust is modeled as a first-order system 
where the time constant can be changed to approximate the engine response times of different 
aircrafts. 

Figure 9.1 shows a diagram of the velocity vector in the North-East-Up coordinate system. 
The time derivatives are taken in this frame. This is not a purely inertial coordinate system, 
because it is rotating with the Earth. However, the rate of rotation of the Earth is sufficiently 
small compared to the aircraft turning rates so that it can be safely neglected. 
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Figure 9.1: Velocity in North-East-Up coordinates. 


NORTH VERTICAL 


gU 


Heading V 


y p sid 


Flight Path Angle 
ld 
EAST ——> HORIZONTAL > 


VERTICAL 


Vsin(y ) A 


V 


Vcos(y )cos(y) 


; NORTH —> 
Veos(y )sin(yw) 


Vcos(y) 


EAST 


The point mass aircraft equations of motion are 


ù = (Tcosa-— D — mgsiny) /m — f, (9.1) 
1 
ğ = m ((L +T sin o) cos p — mg cos y + fy) (9.2) 
: 1 
= — —((L4+Tsi ing — 9.3 
b= (E+ Tsing) sing — fy) 03) 
Le = veosysinyw+ Wy (9.4) 
Yn = vcosycosy+W, (9.5) 
h = v sin y + Wy (9.6) 
. T 
Ue 
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Figure 9.2: Aircraft model showing lift, drag, and gravity. 


4 L 


Nose Line 


Local Horizontal 


where v is the true airspeed, T is the thrust, L is the lift, g is the acceleration of gravity, ~y is the 
air-relative flight path angle, y is the air-relative heading (measured clockwise from North), à 
is the bank angle, x and y are the East and North positions, respectively, and h is the altitude. 
The mass is the total of dry mass and fuel mass. The terms { fv, fy, fy} represent additional 
forces due to modeling uncertainty, and the terms {W,, Wy, W;,} are wind speed components. 
If the vertical wind speed is zero, then y = 0 produces level flight. a, @, and T are the controls. 
Figure 9.2 shows the longitudinal symbols for the aircraft. y is the angle between the velocity 
vector and local horizontal. « is the angle of attack which is between the nose of the aircraft 
and the velocity vector. The wings may be oriented, or have airfoils, that give lift at zero angle 
of attack. Drag is opposite velocity and lift is perpendicular to drag. Lift must balance gravity 
and any downward component of drag; otherwise the aircraft will descend. 
We are using a very simple aerodynamic model. The lift coefficient is defined as 


CL = CL, Q (9.8) 


The lift coefficient is really a nonlinear function of angle of attack. It has a maximum angle 
of attack above which the wing stalls and all lift is lost. For a flat plate, cr, = 27. The drag 
coefficient is 
L 

TARE 
where Ag is the aspect ratio and e is the Oswald efficiency factor which is typically from 0.8 
to 0.95. The efficiency factor is how efficiently lift is couple to drag. If it is less than one, it 
means that lift produces more lift-induced drag than the ideal. The aspect ratio is the ratio of 
the wing span (from the point nearest the fuselage to the tip) and the chord (the length from the 
front to the back of the wing). 


CD = CDy + (9.9) 


171 


CHAPTER 9 MB TERRAIN-BASED NAVIGATION 


The dynamic pressure, the pressure due to the motion of the aircraft, is 
1 
q= 3" (9.10) 
where v is the speed and p is the atmospheric density. This is the pressure on your hand if you 


stick it out of the window of a moving car. The lift and drag forces are 
L = qcrs (9.11) 
D = qcps (9.12) 
where s is the wetted area. The wetted area is the surface of the aircraft that produces lift and 
drag. We make it the same for lift and drag, but in a real aircraft, some parts of the aircraft cause 
drag (like the nose) but don’t produce any lift. In essence, we assume the aircraft is all wing. 


We create a right-hand side function for the model. This will be called by the numerical 
integration function. The following has the dynamical model. 


RHSPointMassAircraft.m 


1 

2 if( nargin « 1 ) 

3 xDot = DefaultDataStructure; 

4 return 

5 end 

6 

E = x(a); 

8 gamma EX ye 

9 psi = (Si); 

o h SE SUS 

i vee = cos(d.alpha) ; 

2 sA = sin(d.alpha) ; 

3 cG - cos(gamma); 

4 sG - sin(gamma); 

5 cPsi - cos(psi); 

6 sPsi - sin(psi); 

7 cPhi - cos(d.phi); 

8 sPhi - sin(d.phi); 

9 

20 mG = d.m*d.g; 

2 qS = 0.5«d.s«xDensity( 0.001«h )»v^2; 

yy (eu = d.cLAlpha«d.alpha; 

23) eD = d.cDO + cL^2/(pixd.aR«d.eps); 
AE = qS«cLh; 

25 drag = qSxcD; 

2 vDot = (d.thrust«cA - drag - mGx*sG)/d.m + d.f(1); 
OT EN = lift + d.thrust«sA; 

28 gammaDot = (fN*cPhi - mG«cG + d.f(2))/(d.m»v) ; 
29 psiDot = (f£N*sPhi - d.£f(3))/(d.mxv«cG) ; 

30 xDot = [vDot;gammaDot;psiDot;v«cG«sSPsi;v«cG«cPsi;vx«sG]; 


The default data structure is defined in the subfunction, DefaultDataStructure. The 
data structure includes both constant parameters and control inputs. 
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32 
E oh — Ecco UE AS SS O AN op E DA e os 
34 ESOS PEU AO A O ENTUS E Orsamadest99368:190/0/9 9 

35 'f',zeros(3,1),'W',zeros(3,1)); 


We use a modified exponential atmosphere for the density: 


37 function rho = Density( h ) 
38 
39 rho = 1.225*exp(-0.0817*h"1.15); 


We want to maintain a force balance so that the speed of the aircraft is constant and the 
aircraft does not change its flight path angle. For example, in level flight, the aircraft would 
not ascend or descend. We need to control the aircraft in level flight so that the velocity stays 
constant and y = 0 for any q. The relevant equations are 

0 = Tcosa—D (9.13) 
0 = (L+T sina) cos ¢— mg (9.14) 
We need to find T and a given à. 

A simple way is to use fminsearch. It will call RHSPointMassAircraft and nu- 
merically find controls that, for a given 4, h and v have zero time derivatives. The following 
code finds equilibrium angle of attack and thrust. RHS is called by £minsearch. It returns 
a scalar cost that is a quadratic of the acceleration (time derivative of velocity) and derivative 
of the flight path angle. Our initial guess is a value of thrust that balances the drag. Even 
with an angle of attack guess of 0, it converges with the default set of parameters opt = 
optimset('fminsearch'). 


EquilibriumControls.m 


function d - EquilibriumControls( x, d ) 


1 
2 

3 if( nargin « 1) 

4 Demo 

5 return 

6 end 

" 

cad] = RHSPointMassAircraft( 0, x, d ); 

9 u0 = [drag;0]; 

10 opt = optimset (’fminsearch’ ) ; 

mr B - fminsearch( GRHS, u0, opt, x, d ); 
poc Curas t ex ULA 

13 d.alpha =D 


15 $$ EquilibriumControls>RHS 
16 function c = RHS( u, x, d ) 


18 d.thrust 
19 d.alpha 


mon 
EE 
NE 
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20 xDot = RHSPointMassAircraft( 0, x, d ); 
2. e IDO x1 ROD O E029 892.7 


The demo is for a Gulfstream 350 flying at 250 m/s and 10 km altitude. 


2 function Demo 
23 
24 
25 
26 
27 
28 
29 
30 fprintf('Thrust 
3  fprintf('Altitude 
32 fprintf('Angle of attack 
( 
( 


- RHSPointMassAircraft; 

3phug eom 

[250;0;0.02;0;0;10000] ; 

= EquilibriumControls( x, d ); 
= x(1)^2/(d.g«tan(d.phi)); 


HQOmx aa 
H 


sr NN el last Sita) 
EMO) I/A LOOO; 

.2f degin' d.alphax*180/pi); 
.2£ degWNn',d.phix180/pi); 
22 f PIND 95) 8150100); 


33 fprintf('Bank angle 
34 fprintf('Turn radius 


X o9? o9 oP oe 
oO oO © © OO 


The results of the demo, shown in the following, are quite reasonable. 


>> EquilibriumControls 


Thrust 7614.63 N 

Altitude 10.00 km 
Angle of attack 2.41 deg 
Bank angle 22.92 deg 
Turn radius 15.08 km 


With these values, the plane will turn without changing altitude or airspeed. We simulate the 
Gulfstream in AircraftSim. The first part runs our equilibrium computation demo. 


AircraftSim.m 
3% Script to simulate a Gulfstream 350 in a banked turn 
n = SOA 


GHE ES 
rTD = 180/pi; 


3% Start by finding the equilibrium controls 
d = RHSPointMassAircraft; 

Gaoli E OAA 

x = [250;0;0.02;0;0;10000]; 

d 

18 


="EquiiltibrtunCcontro e C e 
= x(1)^2/(d.gxtan(d.phi)); 


2 IN Nal? de AGUS ts 

.2£ km\n’,x(6)/1000); 

.2£ degin' d.alphax*180/pi); 
.2£ degin' d.phix*x180/pi); 
.2£ kmn',r/1000); 


fprintf (' Thrust 

fprintf (‘Altitude 
fprintf(’Angle of attack 
fprintf ('Bank angle 
fprintf (' Turn radius 


V 0 JO U BF WHF 000 YI WR 0t - 


AHP A A oP oe 
© © © © OO 


174 


CHAPTER 9 TERRAIN-BASED NAVIGATION 


The next part does the simulation. It breaks the loop if the aircraft altitude is less than O, that is, 
it crashes. We call RHSPointMassAircraft once to get the lift and drag value for plotting. 
It is then called by RungeKutta to do the numerical integration. @ denotes a pointer to the 
function. 


20 $$ Simulation 


20 xPlot = zeros(length(x)+5,n); 

22 

es le = dem 

24 

25 % Get lift and drag for plotting 

26 [ED] - RHSPointMassAircraft( O, x, d ); 

27 

28 $ Plot storage 

29 Plot E Dd alphazxntD di thus d. pha +r rDi|\- 
30 

31 $ Integrate 

32 x - RungeKutta( GRHSPointMassAircraft, 0, x, dT, d ); 
33 

34 $ A crash 

35 sa) Es 0) 3) 

36 break; 

37 end 

38 end 


The remainder produces three plots. The first plot is the states that are numerically integrated. 
The next gives the controls, lift, and drag. The final plot shows the planar trajectory. We do 
unit conversions since degrees and kilometers are a bit clearer. 


39 %% Plot the results 


40 xPlot e velle o5 EK); 

4 xPlot(2,:) XPLOD) It 

4 xPlot(4:6,:) - xPlot(4:6,:)/1000; 

43 yL = ('v (m/s)' 'Ngamma (deg)' 'Mpsi (deg)’ 'x e (km)' 'y n 

(mg 

44 Aa Cay YE (Np)? Fie) (opo “Acuse? ar (up? VU Nee: 
(deg) '); 

ate tela = TimeLabel (dTx*(0: (k-1)))5 

46 

a Spit Set te, ¿do (lso, 2), “ee dela, tao, aT ato e vA o es 

48 Meaejbnee) ielcle?, EPA crofteStateod )) p 

26) lose ie, lors, Ss), Yee dell, tela, Cs acia, Yaris dd) p= sos 

50 "Eucgurestdtle/5eAncrcaftehitt Drag and Controls 00 


As you can see in Figure 9.4, the radius of the turn is 15 km as expected. The drag and lift 
remain constant. In practice we would have a velocity and flight path angle control system to 
handle disturbances or parameter variations. For the purpose of our deep learning example, we 
just use the ideal dynamics. Figure 9.3 shows the simulation outputs. 
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Figure 9.3: Simulation outputs. States (the integrated quantities) are on the top. Lift, drag, and 
the controls à, a, and T are on the bottom. 
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Figure 9.4: Aircraft trajectory. 
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Figure 9.4 will provide a nice trajectory for our deep learning examples. You can change 
the aircraft simulation to produce other trajectories. 


9.3 Generating a Terrain Model 
9.3.1 Problem 


We want to create an artificial terrain model from a set of terrain ““tiles.”” A tile is a segment 
of terrain from a bigger picture, much like bathroom tiles make up a bathroom wall. Unless, of 
course, you have the modern fiberglass shower. 


9.3.2 Solution 


Find images of terrain and tile them together. There are many sources of terrain tiles. 
GoogleEarth is one. 


9.3.3 How It Works 


We start by compiling a database of terrain tiles. We have them in the folder terrain in our 
MATLAB package. A segment of the terrain folder is shown in Figure 9.5. This is just one way 
to get terrain tiles. There are online sources for downloading tiles. Also many flight simulator 
games have extensive terrain libraries. The name of the folder is latitude longitude. 
For example, - 10-10 is —10 degrees latitude and —10 degrees longitude. Our database only 
extends to + 60 degrees latitude. The first block creates a list of the folders in terrain. An 
important thing with this code is that your script needs to be in the correct directory. We don’t 
do any fancy directory searching. 
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Figure 9.5: A segment of the terrain folder. 
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CreateTerrain.m 


function CreateTerrain( lat, lon, scale ) 


1 
2 

3 % Demo 

4 if( nargin < 1 ) 

5 Demo; 

6 return 

7 end 

8 

Gy Yl = dir('terrain'); 

o lata = zeros(1,468); 

1 lonA = zeros(1,468); 

2 folderName = cell(1,468); 

3 for k = 1:468 

4 q - d(k).name; 

5 folderName{k} = q; 

6 if( q(2) == '0' ) 

7 latA(k) = str2double(q(1:2)); 

8 lonA(k) = str2double(q(3:end) ) ; 
9 else 

20 latA(k) = str2double(q(1:3)); 
21 lonA(k) = str2double(q(4:end) ) ; 
22) end 

23 end 


The next code block finds the indices for the desired tiles. 


o 


24 % Center lower left corner is start 


2 latF = floor (lat); 
26 lonF = floor (lon); 
27 latI - zeros(1,9); 
28 lonI - zeros(1,9); 
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29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 


at; = abe 
LOL a] ee abe) 
for k= 
Toman = ongk; 
latI(i) = latJK; 
lonJK = LonJK + 10; 
ak Ex sb us bs 
end 
fond e Longs: 
aus = lekas > O 
end 


fldr = zeros(1,9); 
LOLAS 


3 = find(latI(k)==latA); 
i = lonI(k)==lonA(j) ; 
Ed KAN 

end 


The following code creates the filenames based on our latitudes and longitudes. We just create 
correctly formatted strings. This shows one way to create strings. Notice we use %d to create 
integers. It automatically makes them the right length. We need to check for positive and 
negative so that the + and - signs are correct. 


50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 


% Generate the file names 


imageSet = cell(1,9); 
fore AO 
J Se acalicke(Gis)) 
aa) sex Qr )) 
if( lonA(j) >= ) 
imageSet{k} = sprintf ('grid10x10+%d+%d' ,latA(j)*100,lonA(3)*100); 
else 
imageSet {k} = sprintf ('grid10x10+%d-%d' , latA(j)*100,lonA(j)+*100) ; 
end 
else 


if( lonA(j) >= 0) 
imageSet {k} = sprintf ('grid10x10-%d+%d' , latA(j)*100,lonA(j)*100) ; 
else 
imageSet {k} = sprintf ('grid10x10-%d-%d' , latA(j)*100,lonA(j)*100) ; 
end 
end 
end 


The next block reads in the image, flips it upside down, and scales the image. The images 
happen to be north down and south up. We first change directory to be in terrain then cd 
to go into each folder. cd .. changes directories back into terrain. 
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o 


$ Assuming we are one directory above 
cd terrain 


im zucell(179)5 
fora NRO 
j = fldr (k); 


cd (folderName{j }) 
im{k} = ScaleImage(flipud(imread([imageSet(k],'.jpg'1)),scale); 
cd 
end 


The next block of code calls image to draw each image in the correct spot on the 3 by 3 tiled 
map. 


78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 
91 
92 
93 
94 
95 
96 


del size(im(1),1); 
1X = 3xdel; 


$ Draw the images 
x CORSI 
Y = 0; 
for = lS) 
image ('xdata' , [x;x+del],’ydata’, ly;y+del],’cdata’, im{k} ); 
hold on 
eS oe a del; 


axis off 
axis image 


cd 


The subfunction ScaleImage scales the image by doing a mean of the pixels that are scaled 
down to 1 pixel. At the very end, we cd .. putting us into the original directory. 
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97 $$ CreateTerrain>ScaleImage 
98 function s2 = ScaleImage( sl, q ) 


Mm. a 2 

o2 ([mR,~,mD] = size(s1); 

04 m = mR/n; 

06 S2 = zeros(m,m,mD, 'uint8'); 


os for i = 1:mD 


09 for j = 1:m 

10 r = (j-1)xn«1:j«n; 

11 for k = im 

12 c = (k-1)*«n+1:kxn; 

13 S2(j,k,i) = mean (mean (s1 (r,c,1))); 
14 end 

15 end 

16 end 


The demo picks a latitude and longitude in the Middle East. The results are the 3 by 3 tiled 
image is shown in Figure 9.6. We won't use this image for the neural net because it would be 
too low resolution for anything but a satellite. 


17 $$ CreateTerrain>Demo 

118 function Demo 

119 

120 NewFigure('EarthSegment'); 
121 CreateTerrain( 30,60,1 ) 


Figure 9.6: Terrain tiled image of the Middle East. 
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9.4 Close Up Terrain 
9.4.1 Problem 


We want higher-resolution terrain. 


9.4.2 Solution 


Specialize the terrain code to produce a small segment of higher-resolution terrain suitable for 
experiments with a commercial drone. 


9.4.3 How It Works 


The preceding terrain code would work well for an orbiting satellite, but not so well for a drone. 
Per FAA regulations, the maximum altitude for small unmanned aircraft is 400 feet, or about 
122 meters. A satellite in Low Earth Orbit (LEO) typically has an altitude of 300--500 km. Thus, 
drones are typically about 2500--4000 times closer to the surface than a satellite! We take the 
code and specialize it to read in just four images. It is much simpler than CreateTerrain 
and is less flexible. If you want to change it, you will need to change the code in the file. 


CreateTerrainClose.m 


function CreateTerrainClose 


1 
2 

3 % Generate the file names 

4 imageSet = ['grid1x143400-11800','grid1x1«3400-11900',... 
5 'grid1x143500-11800','gridi1x143500-11900']; 

e jr S (2.1 43% 

7 

8 $ Assuming we are one directory above 

9 ed terrainclose 

0 

e als ona ML Ne 

2 fon wk = iga 

3 im{k} = £lipud(imread([imageSet{k},’.jpg’])); 

4 end 

5 

6 del = size(im{1},1); 

7 

8 $ Draw the images 

9 x = 0 

m a = (0p 

ar ak = Op 

by o Kee > RA 

23 LOLI] = be) 

24 a = ah, eB alo 

25 image ('xdata', [x;x+del],’ydata’, [y;y+del],’cdata’, im{p(i)} ); 
26 hold on 

27 x = ep del; 

28 end 

29 IO 

30 Y= ye elk 
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Figure 9.7: Close up terrain. 


31 end 

32 axis off 

33 axis image 
34 

ahi (fol 


We don’t have any options for scaling. This runs the function: 


>> NewFigure(’EarthSegmentClose’ ) ; 
>> CreateTerrainClose 


Figure 9.7 shows the terrain. It is 2 degrees by 2 degrees. 


9.5 Building the Camera Model 
9.5.1 Problem 


We want to build a camera model for our deep learning system. We want a model that emulates 
the function of a drone-mounted camera. Ultimately, we will use this camera model as part of 
a terrain-based navigation system, and we’ll apply deep learning techniques to do the terrain 
navigation. 


9.5.2 Solution 


We will model a pinhole camera and create a high altitude aircraft. A pinhole camera is the 
lowest order approximation to a real optical system. We'll then build the simulation and demon- 
strate the camera. 
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Figure 9.8: Pinhole camera. 


Pinhole 
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9.5.3 How It Works 


We've already created an aircraft simulation in Recipe 9.2. The addition will be the terrain 
model and the camera model. A pinhole camera is shown in Figure 9.8. A pinhole camera has 
infinite depth of field, and the images are rectilinear. 

A point P(x, y, z) is mapped to the imaging plane by the relationships 


u= I (9.15) 
v= fy (9.16) 


where u and v are coordinates in the focal plane, f is the focal length, and h is the distance from 
the pinhole to the point along the axis normal to the focal plane. This assumes that the z-axis of 
the coordinate frame zx, y, z is aligned with the boresight of the camera. The angle that is seen 


by the imaging chip is 
0 = tan"! (=) (9.17) 


where f is the focal length. The shorter the focal length, the larger the image. The pinhole 
camera does not have any depth of field, but that is unimportant for this far-field imaging 
problem. The field of view of a pinhole camera is limited only by the sensing element. Real 
cameras have lenses, and the images are not perfect across the imaging array. This presents 
practical problems that need to be solved in real machine vision systems. 

We want our camera to see 16 pixels by 16 pixels from the terrain image in Figure 9.7. We 
will assume a flight altitude of 10 km. Figure 9.9 gives the dimensions. 
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Figure 9.9: Pinhole camera with dimensions. 
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We are not actually simulating any camera. Instead, our camera model is producing 16 by 
16 pixel maps given an input of a position. The output is a data structure with the x and y 
coordinates and an image. If no inputs are given, it will create a tiled map of the image. We 
scaled the image in the GraphicConverter app so that it is exactly 


672 672 3 


and saved it in the file TerrainClose.jpg. The numbers are x pixels, y pixels, and three 
layers for red, green, and blue. The third index is for the red, blue, and green matrices. This is 
a three-dimensional matrix, typical for color images. 

The code is shown in the following. We convert everything to pixels, get the image using 
[~,~,i] = getimage (h), and get the segment. 

The first part of the code is to provide defaults for the user. 


TerrainCamera.m 


function d - TerrainCamera( r, h, nBits, w, nP ) 
$ Demo 
if( nargin « 1 ) 
Demo; 
return 
end 


XO. ooN Ov A FY bb RS 


if( nargin < 3 ) 
IBS 
end 


- 65 
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"n 

3 if( nargin « 4 ) 
4 w = []; 

5 end 

6 

7 if( nargin < 5) 
8 nP = 64; 

9 end 

20 

21 if( isempty(w) ) 


N d) NN 
A ROO S 
0) 
B 
a= 
ll 
BS 
(em) 
(e 
e 


if( isempty(nBits) ) 
nBits = 16; 
end 


N N 
aa 


The next part computes the pixels. 


TerrainCamera.m 


dW = w/nP; 
k = £loor(r(1)/dW) + nP/4 + 1; 
j = £loor((w/2-r(2))/dW) - nP/4 + 1; 


oO €^ Ro o o 


kR = k:(k-1 + nBits); 
j:(j-1 + nBits); 


The remainder displays the image. 


TerrainCamera.m 


[57 gal 


getimage(h); 


i(kR,kJ,:); 
Ex Sela) ¢ 


cl 
eats 


Aun Rot -— 


if( nargout « 1 ) 


The demo draws the source image and then the camera image. Both are shown in Fig- 
ure 9.10. 


7 axis off 

8 axis image 
9 clear p 

10 end 


2 $$ CreateTerrain»Demo 
13 function Demo 


15 h = NewFigure('Earth Segment'); 
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Figure 9.10: Terrain camera source image and camera view. The camera view is 16 x 16 pixels. 
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i = imread('TerrainClose64.jpg'); 
image (i); 
grid 


NewFigure('Terrain Camera’ ) ; 
x = linspace(0,10,20); 


1000 1100 


The terrain image from the camera is blurry because it has so few pixels. 


9.6 Plot Trajectory over an Image 
9.6.1 Problem 


We want to plot our trajectory over an image. 


9.6.2 Solution 


Create a function to draw the image and plot the trajectory on top. 


9.6.3 How It Works 


We write a function that reads in an image and plots the trajectory on top. We scale the image 
using image. The x-dimension is set and the y-dimension is scaled to match. 


PlotXY Trajectory.m 


1 
2 
3 


Can plot multiple sets of data. 
$ Input 


oo op oo 


$ PLOTXYTRAJECTORY Draw an xy trajectory over an image 
Type PlotXYTrajectory for a demo. 
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AS (:,:) X coordinates (m) 

5 %y (:,:) Y coordinates (m) 

6 $ i (n,m) Image 

7 SW (1,1) x dimension of the image 
8 $ xScale (1,1) Scale of x dimension 
9 % name (1,:) Figure name 

0 

1 function PlotXYTrajectory( x, y, i, xScale, name ) 
2 

3 if( nargin « 1 ) 

4 Demo 

5 return 

6 end 

7 

8 Sg = size(i); 

9 xI = [-xScale xScale]; 

2 yI = [-xScale xScale]«*s(2)/s(1); 

21 

22 NewFigure (name); 

23 image(xI,yI,flipud(i)); 

24 hold on 

205m a Sze: (Saran) 

Ay Eom k=) ga 

27 plot(x(k,:),y(k,:),'linewidth',2) 
28 end 

21 set (geca, lima, ylim yT); 


w 
[2] 


grid on 

axis image 
xlabel('x (m)') 
ylabel('y (m)') 


Oo UG GSC 
o nN = 


The demo draws a circle over our terrain image. This is shown in Figure 9.11. 


36 $$ PlotXYTrajectory>Demo 

37 function Demo 

38 

39 i = imread('TerrainClose.jpg'); 

40 a = linspace(0,2«pi) ; 

4 x = [30«cos (a) ;35«cos(a)]; 

4 y = [30xsin(a);35xsin(a)l; 

fe oc cado O X y al, Mall, @umcenicyeiconayy” )) 


While the deep learning system will analyze all of the pixels in the image, it is interesting 
to see how the mean values of the pixels for each color vary for each image across the image. 
This is shown in Figure 9.12. The x-axis is the image number, going by rows of constant y. As 
can be seen, there is considerable variation even in nearby images. This indicates that there is 
sufficient information in each image for our deep learning system to be able to find locations. 
It also shows that it might be possible just to use mean values to identify location. Remember 
that each image varies from the previous by only 16 pixels. 
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Figure 9.11: Trajectory plot. 
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9.7 Creating the Test Images 
9.7.1 Problem 


We want to create test images for our terrain model. 


9.7.2 Solution 
We build a script to read in the 64 by 64 bit image and create training images. 


9.7.3 How It Works 


We first create a 64-bit version of our terrain, using any image processing app. We've already 
done that, and itis saved as TerrainClose64.jpg. The following script reads in the image 
and generates training images by displacing the index one pixel at a time. We save the images 
in the folder TerrainImages. We also create labels. Each image is a different label. For 
each terrain snippet, we create nN copies with noise. Thus there will be nN images with the 
label 1. We add noise with the code 


uint8 (floor (sig*xrand(nBits,nBits, 3) )) 


since the noise must be uint8 like the image. You'll get an error if you don't convert to 
uint8. You can also select different strides, that is, moving the images more than 1 pixel. The 
first code sets up the image processing. We choose 16-bit images because (after the next step 
of training) there is enough information in each image to classify each one. We tried 8 bits but 
it didn’t converge. 


CreateTerrainImages.m 


1 im = flipud(imread('TerrainClose64.jpg')); % Read in the image 

2 wim = A000; vom 

y seh. e 3p 

4 dN Ex lp Cx nere cial ais! 2 

5 nBM1 = nBits-1; 

6 [n,m] = size(im); $ Size of the image 

9". gil - (n-nBits)/dN + 1; $ The number of images down one side 

8 nN = oa $ How many copies of each image we want 

9 sig = oy % Set to > 0 to add noise to the images 

10 dW = wIm/64; % Delta position for each image (m) 

1 x0 = -wim/2+(nBits/2) «dW; $ Starting location in the upper left 
corner 

12 yO = wim/2-(nBits/2) «dW; $ Starting location in the upper left 
corner 


This line is very important. It makes sure the names correspond to distinct images. We will 
make copies of each image for training purposes. 
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CreateTerrainImages.m 


2 


1 % Make an image serial number so they remain in order in the 
imageDatastore 
2 kAdd = 10^ceil(log10 (nI«nI«nN)); 


We do some directory manipulations here. 


CreateTerrainImages.m 


o 


% Set up the directory 


1 

2 if "exist('TerrainImages','dir') 

3 warning('Are you in the right folder? No TerrainImages') 

4 [success,msg] = mkdir('./','TerrainImages') 

5 end 

6 cd TerrainImages 

7 delete «.jpg % Starting from scratch so delete existing images 


The image splitting is done in this code. We add noise, if desired. 


CreateTerrainImages.m 


à ab E UR 

LEE CMM: 

net - zeros(1,nI«nI«nN); $ The label for each image 

Zi e - x0; $ Initial location 

5 y = y0; $ Initial location 

B cy = zeros(2,nI«nI); $ The x and y coordinates of each image 

7 id = zeros(1,nI«*nI); 

SEE NES e 

Oe asis =: dE 

10) for k=. nt 

11 disp (k) 

12 kR = dNx*(k-1)+1:dNx*(k-1) + nBits; 

13 forse el 

14 kJ = dNx (3-1)+1:dNx* (3-1) + nBits; 

15 thisImg = im(kR,kJ,:); 

16 rgbs(end+1,:) = [mean(mean(thisImg(:,:,1))) mean (mean (thisImg 
(:,:,2))) mean(mean(thisImg(:,:,3)))]; 

17 for p = 1:nN 

18 s = im(kR,kJ,:) + uint8(floor(sigxrand(nBits,nBits,3))); 

19 a = (S5256;; 

20 S (q) = 2567 

21 a e 50 

29 S (q) = Or 

23 imwrite(s,sprint£(’TerrainImage%d.jpg’ ,i+kAdd) ) ; 

24 i013 NIS. 

25 js Ex ab Wwe gb 

26 end 
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27 E E aval g 
28 1d(1R) = iR; 

29 due eg cam zu aL 

30 dl Los is 

31 Ey = a 

32 end 

33 ex pig dh GINA 

34 W^ = Ap 

35 end 


Figure 9.13 shows that the images really cover the area. We also verified that the sum of R, 
G, and B was different for each image. This indicates that there is enough information for the 
machine learning algorithm. 


Figure 9.13: This figure shows that the images cover the landscape. 
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9.8 Training and Testing 
9.8.1 Problem 


We want to create and test a convolutional neural network. The neural net will be trained to 
associate images with an x and y location. 


9.8.2 Solution 


We create and test a convolutional neural network in TerrainNeuralNet.m. This will 
be trained on the images created earlier and will be able to return the x and y coordinates. 
Convolutional neural networks are widely used for image identification. 


9.8.3 How It Works 


This example is much like the one in Chapter 3. The difference is that each image is a separate 
category. This is like face identification where each category is a different person. 


TerrainNeuralNet.m 


$ Script implementing the terrain neural net 
You must have created the images in TerrainImages with 
CreateTerrainImages 
$ before running this script. 


op oo 


3 
4 

5 $$ Get the images 

6 cd TerrainImages 

7 label - load('Label'); 
8 

9 


cd 
i E = categorical (label. t); 
11 nClasses - max(label.t); 
12 imds = imageDatastore('TerrainImages','labels',t); 


13 labelCount = countEachLabel (imds) ; 


o 


5 $ Display a few snapshots 
16 NewFigure('Terrain Snapshots'); 


17 Me «Aye 

ie du = y 

19 ks = sort(randi(length(label.t),1,n«m)); % random selection 
20 for i = 1:n«m 

21 subplot (n,m, 1); 

22 imshow(imds.Files(ks(i)]); 

23 title(sprintf('Image $d: $d',ks(i),label.t(ks(i)))) 

24 end 


2 % We need the size of the images for the input layer 

27 img - readimage(imds,1); 

2 $ Split into training and testing sets 

30 fracTraining - 0.8; 

31 [imdsTrain,imdsTest] = splitEachLabel(imds,fracTraining,'randomized'); 
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% Training 
This gives the structure of the convolutional neural net 


35 layers = [ 

36 imageInputLayer(size(img)) 

37 

38 convolution2dLayer(3,8,'Padding','same') 
39 batchNormalizationLayer 

40 reluLayer 

41 

42 maxPooling2dLayer(2,'Stride',2) 

43 

44 convolution2dLayer(3,32,'Padding','same') 
45 batchNormalizationLayer 

46 reluLayer 

47 

48 maxPooling2dLayer(2,'Stride',2) 

49 

50 fullyConnectedLayer (nClasses) 

51 softmaxLayer 

52 classificationLayer 


53 1g 
54 disp(layers) 


56 options - trainingOptions('sgdm', 
57 'InitialLearnRate',0.01, 

58 'MaxEpochs',6, 

59 'MiniBatchSize',100,... 

60 'ValidationData',imdsTest, 

61 'ValidationFrequency',10, 

62 'ValidationPatience',inf,... 

63 'Shuffle','every-epoch', 

64 'Verbose',false, 

65 'Plots','training-progress'); 


66 disp(options) 
67 fprintf('Fraction for training %8.2f%%\n’,fracTraining*100) ; 


70 terrainNet - trainNetwork(imdsTrain,layers,options); 
72 3% Test the neural net 
73 predLabels = classify(terrainNet,imdsTest) ; 


74 testLabels = imdsTest.Labels; 


76 accuracy = sum(predLabels == testLabels) /numel (testLabels) ; 
7 fprintf('Accuracy is %8.2f%%\n’,accuracy*100) 


79 save('TerrainNet','terrainNet') 


We have an image layer to read in each image. We next convolve them with filters. The weights 
of the filters are determined during the learning. We normalize the outputs and pass through the 
reLu activation function. Pooling compresses the data. Padding sets the output size equal to the 
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Figure 9.14: These selected terrain images show what the neural net is classifying. 


Image 446: 45 


Image 899: 90 Image 1740: 174 


Image 5909: 591 Image 7283: 729 


Image 10269: 1027 


Image 4962: 497 Image 5761: 577 


Image 5813: 582 Image 8250: 825 Image 8394: 840 


Image 9654: 966 Image 10607: 1061 


Image 12620: 1262 


Image 13336: 1334 


Image 16508: 1651 Image 18217: 1822 Image 20628: 2063 


input size. As seen by the layers printout, no padding is needed since the images are all the same 
size. The first layer has eight 3 by 3 pixel filters. The second layer has 32 3 by 3 pixel filters. 
The final set of layers is used to classify the images. As noted in the previous section, each 
image has a unique “class” which is associated with its location. We use a constant learning 
rate. The batch size is smaller than the default. 

Figure 9.14 shows some of the images. Figure 9.15 shows the training window. It is able 
to categorize the images after 7 epochs. The difference between two adjacent images is only 16 
pixels. It isn’t a lot of data, but the neural net can categorize each image with 100% accuracy. 

In each epoch in Figure 9.15, it is processing all of the training data. 


Image 20789: 2079 Image 21341: 2135 


>> TerrainNeuralNet 
12x1 Layer array with layers: 


al ^ Image Input 16x16x3 images with 'zerocenter' 
normalization 
2 E Convolution 8 3x3 convolutions with stride [1 


1] and padding ‘same’ 
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... Training Progress (01-Oct-2019 18-33-00) 
Training Progress (01-Oct-2019 18:33:00) 
100 eene ett then enn nennen nennen nennen enne fou uano eames fessis trai asutaa 
le P afer Pm ET 
" “FV == 
KU =< 2 
T MW" Er 
a IN — A 
m V" pa " 
ANT a 
n e] "- 
[Epoch pese ve Ep poc Epcen E 
hes * e Din 
Ms 
CEN — 
LA bo > zs moras 
V Sa occae 
] OK " 
e us 
= uo mmm I nte ines 
3 y Batch Normalization Batch normalization 
4 Pt ReLU ReLU 
5 ui Max Pooling 2x2 max pooling with stride [2 2] 
and padding [0 0 0 0] 
6 zu Convolution 32 3x3 convolutions with stride [1 
1] and padding ‘same’ 
7 or Batch Normalization Batch normalization 
8 es ReLU ReLU 
9 fet Max Pooling 2x2 max pooling with stride [2 2] 
and padding [0 0 0 0] 
10 a Fully Connected 2401 fully connected layer 
ASTE Ey Softmax softmax 
T2 ga Classification Output crossentropyex 
TrainingOptionsSGDM with properties: 
Momentum: 0.9000 
InitialLearnRate: 0.0100 
LearnRateScheduleSettings: x[11 struct] 
L2Regularization: 1.0000e-04 
GradientThresholdMethod: ‘12norm’ 
GradientThreshold: Inf 
MaxEpochs: 6 
MiniBatchSize: 100 
Verbose: 0 
VerboseFrequency: 50 
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ValidationData: x[11 matlab.io.datastore.ImageDatastore] 
ValidationFrequency: 10 
ValidationPatience: Inf 
Shuffle: 'every-epoch' 


CheckpointPath: '' 
ExecutionEnvironment: ‘auto’ 
WorkerLoad: [] 
OutputFen: [] 
Plots: 'training-progress' 


SequenceLength: ‘longest’ 
SequencePaddingValue: 0 
DispatchInBackground: 0 

Fraction for training 80.00% 
Accuracy is 100.00% 


We get 100% accuracy. You can explore changing the number of layers and trying different 
activation functions. 


9.9 Simulation 
9.9.1 Problem 


We want to test our deep learning algorithm using our terrain model. 


9.9.2 Solution 


We build a simulation using the trained neural net. 


9.9.3 How It Works 


We reproduce the simulation from the previous section and remove some unneeded output so 
that we can focus on the neural net. We read in the trained neural net. 


AircraftNNSim.m 


1 $$ Load the neural net 


The neural net classifies the image obtained by the camera. We convert the category into an 
integer using int32. The subplot displays the image the neural net identifies as matching 
the camera image and the camera image. The simulation loop stops if your altitude, x (6), 1s 
less than 1. 


34 $$ Start by finding the equilibrium controls 
35 d = RHSPointMassAircraft; 

36 V eub 

3 d.phi = atan(v^2/(rxd.9)); 

388 X = [v;0;0;-r;0;10000]; 

3 d = Egquuasupbrziumeontrobsi( 3d 

40 

4 3% Simulation 

42 xPlot = zeros(length(x)-43,n); 
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$ Put the image in a figure so that we can read it 
- NewFigure('Earth Segment'); 

- imread('TerrainClose64.jpg'); 

image (i); 

axis image 


eon 


NewFigure('Camera'); 
for le = ilam 


% Get the image for the neural net 
im = TerrainCamera( x(4:5), h, nBits ); 


% Run the neural net 
dL - classify(nN.terrainNet,im.p); 


$ Plot storage 
ab = alice Sa Ib) y 
xPlot(:,k) e [erat sigs pa) abl] y 


$ Integrate 
ES = RungeKutta( @RHSPointMassAircraft, 0, x, dT, d ); 


$ A crash 
Lao) Es Y.) 
break; 
end 
end 


23% Plot the results 


xPlot = Salle. (og db es) 

xPlot (2,:) = xPlot(2,:)x*rTD; 

xPlot(4:6,:) = xPlot(4:6,:); 

yL = ('v (m/s)' 'Ngamma (deg)' 'Npsi (deg)' 'x (m)' Exe rr) 


Figure 9.16 shows the trajectory and the camera view. We simulate one full circle. 
The identified terrain segment and the path, based on the neural network location, are shown 


in Figure 9.17. The neural net classifies the terrain it is seeing. The location of each image is 
read out and used to plot the trajectory. 
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Figure 9.16: The camera view and trajectory. This is one full circle. 
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The 2D trajectory is shown in Figure 9.18 for a circular path. We make sure we are in the 
regions where each image is a one pixel change from the previous image. In the corners the 
camera would stay in one image until that image were exited. On the edges there is one image 
border. If we were in that region, the resolution would be low. The trajectory from the images 
is reasonably close to the actual trajectory. Better results would require higher resolution. In 
practice, the measured positions would be inputs to a Kalman filter [30] that modeled the aircraft 
dynamics, given earlier in this chapter. This would smooth the trajectory and improve accuracy. 

This chapter shows how a neural network can be used to identify terrain for the purposes 
of aircraft navigation. We simplify things by flying at a constant altitude, use a pinhole camera 
model with a fixed image orientation and ignore clouds and other complications. We use a con- 
volutional neural network to train the neural net with good results. As noted, higher resolution 
images and a Kalman filter would produce a smoother trajectory. 
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Figure 9.17: The identified terrain segments and the aircraft path. 
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Figure 9.18: The identified terrain segments and the aircraft path. 
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Stock Prediction 


10.1 Introduction 


The goal of a stock prediction algorithm is to recommend a portfolio of stocks that will max- 
imize an investor’s return. The investor has a finite amount of money and wants to create a 
portfolio to maximize her or his return on investment. The neural network in this chapter will 
predict the behavior of a stock given its history. This could then be used to select a portfolio of 
stocks with some idea of the future performance. The stock market model is based on Geomet- 
ric Brownian Motion. Given that we could do statistical analysis that would allow us to pick 
stocks. We’ll show that a neural net, which does not have any knowledge of model, can do as 
well in modeling the stocks. 


10.2 Generating a Stock Market 
10.2.1 Problem 


We want to create an artificial stock market that replicates real stocks. 


10.2.2 Solution 

Implement Geometric Brownian Motion. This was invented by Paul Samuelson, Nobel Laure- 
ate [25]. 

10.2.3 How It Works 


Paul Samuelson [10] created a stock model based on Geometric Brownian Motion. This ap- 
proach produces realistic numbers and will not go negative. This is effectively a random walk 
in log-space. The stochastic differential equation is 


dS(t) = rSdt + 0SdW (t) (10.1) 


S is the stock price. W (t) is a Brownian, random walk, process. t is the time and dt is the time 
differential. r is the drift and ø is the volatility. Both range from 0 to 1. It could also be written 


© Michael Paluszek and Stephanie Thomas 2020 203 
M. Paluszek and S. Thomas, Practical MATLAB Deep Learning, 
https://doi.org/10.1007/978- 1-4842-5124-9_10 


CHAPTER 10 STOCK PREDICTION 


in differential equation form as 


dS — dW (t) 
m (r+o dl )s (10.2) 
The solution is 
S(t) = S(0)el"- 309 row) (10.3) 


The following shows the code used to generate the stock trends. We use cumsum to sum 
the random numbers for the random walk. We use a Gaussian or normal distribution produced 
by randn to create the random numbers. The function can create multiple stocks. 


StockPrice.m 


1 function [s, t] = StockPrice( s0, r, sigma, tEnd, nInt ) 
2 

3 if( nargin < 1 ) 

4 Demo 

5 return 

6 end 

7 

8 delta = tEnd/nInt; 

9 sDelta = sqrt(delta); 

0 t = linspace(0,tEnd,nInt-41); 

Em = length(s0); 

2 wW = [zeros(m,1) cumsum(sDelta.xrandn (m,nInt))]; 
3 8 = zeros(1,nInt+1); 

AE =D uta ii 

ij Weeks k= Im 

6 s(k,:) = sO(k) xexp(£(k)«t + sigma(k)»*w(k,:)); 

7 end 


The demo is based on the Wilshire 5000 statistics. It is an index of all US stocks. If you 
run it, you will get different values since the input is random. 


o 


18 %% StockPrice>Demo 
19 function Demo 


2 CENA == DIO 

29:0 - 1448; 

23 S0 = 82423318). 
4 Tr = 0-168 22162). 


25. signa = 0-1722922; 
26 StockPrice( s0, r, sigma, tEnd, n ); 


The results are shown in Figure 10.1. They look like a real stock. Changing the drift or 
volatility will change the overall shape. For example, if you set the volatility, c = 0, you get 
the very nice stock shown in Figure 10.2. Increasing r makes the stock grow faster. This gives 
us the general rule that we want high r and low o. See Figure 10.3. 
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Figure 10.1: A random stock based on statistics from the Wilshire 5000. 
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Our model is based on two coefficients. We could make a stock picking algorithm by just 
fitting stock price curves and computing r and c. However, we want to see how well Deep 
Learning does. Remember, this is a simple model of stock prices. Both c and r could also be 
functions of time, or random variables by themselves. Of course, there are other stock models 
too! The idea here is that deep learning creates its own internal model without a need to be told 
about the model underlying the observed data. 

The function PlotStock.m plots the stock price. Notice that we format the y tick labels 
ourselves to get rid of the exponential format that MATLAB would normally employ. gca 
returns the current axes handle. 


PlotStock.m 
1 function PlotStock(t,s,symb) 
2 
3 if( nargin < 1) 
4 Demo; 
5 return; 
6 end 
2 
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Figure 10.2: A stock with zero volatility. This is a good stock to own, though the index fund 
isn't too bad either. 
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m = size(s,1); 


BULGES (ie iy, eb UNS SA ensi oc Idco EE quet lc Tu 
'Stocks','Plot Set’,{1:m},’legend’, {symb}) ; 


% Format the ticks 
T = igeti(gcea, “YTick!); 
Th cell(1,length(yT)); 
or k = 1:length(yT) 
yTL{k} = sprint£('$5.0f',yT(k)); 


Wh x 


end 
set(gca,'YTickLabel', yTL) 


v 0 321 DAWN R 0t EF O vo o 


The built-in demo of PlotStock is the same as in StockPrice. 
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Figure 10.3: A stock with high volatility and low drift such that r — 10? < 1. In this case, 
r — 0.1 and o = 0.6. 
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20 function Demo 


v. (mero = 2 5 

23 nInt = 1448; 

24 s0 = 8242.38; 

o MEETS = 0.1682262; 
2 sigma = 0.1722922; 
as El = StockPrice(=s0), eh, sigma, thnd, minte), 

28 PlotStock(t,s,{}) 
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10.3 Create a Stock Market 
10.3.1 Problem 


We want to create a stock market. 
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10.3.2 Solution 


Use the stock price function to create 100 stocks with randomly chosen parameters. 


10.3.3 How It Works 


We write a function that randomly picks stock starting prices, volatility, and drift. It also creates 
random three letter stock names. We use a half normal distribution for the stock prices. This 
code generates the random market. We limit the drift to between 0 and 0.5. This creates more 
stocks (for small markets) that go down. 


StockMarket.m 


1 
2 
3 
4 
5 
6 
a 
8 
9 


function d = StockMarket( nStocks, s0OMean, sOSigma, tEnd, nInt ) 


if( nargin < 1 ) 


Demo 
return 
end 
d.s0 = abs(s0Mean + s0Sigma«randn(1,nStocks)); 
folate = 0.5«rand(1,nStocks); 
d.sigma = rand(1,nStocks) ; 
S = 'Al:'Z!'; 
for k = 1:nStocks 
J ran DCi: 
CESA AA 
end 


The following code plots all of the stocks on one plot. We create a legend and make the y labels 
integers (using PLot Stock). 


23 
24 
25 
26 
27 
28 
29 
30 
31 


$ Output 
if( nargout < 1 ) 
S = StockPriceldas0) «des, dasigmna;, tind, mints); 


E = linspace(0,tEnd,nInt+1); 
PlotStock(t,s,d.symb) ; 
clear d 

end 


The demo is shown as follows 


3% StockPrice>Demo 
function Demo 


NS POCK Ss Sel 5: number of stocks 


s0Mean = 8000; % Mean stock price 

s0Sigma = 3000; % Standard dev of price 
tEnd = 5.75; s years duration for market 
nInt = 1448; % number of intervals 


StockMarket( nStocks, s0Mean, s0Sigma, tEnd, nInt ); 


208 


CHAPTER 10M STOCK PREDICTION 


Figure 10.4: Two runs of random five stock markets. 
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Two runs are shown in Figure 10.4. 
A stock market with a hundred stocks is shown in Figure 10.5. 
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10.4 Training and Testing 
10.4.1 Problem 


We want to build a deep learning system to predict the performance of a stock. This can be 
applied to the stock market created earlier to predict the performance of a portfolio. 


10.4.2 Solution 


The history of a stock is a time series. We will use a long short-term memory (LSTM) net- 
work to predict the future performance of the stock based on past data. Past performance is not 
necessarily indicative of future results. All investments carry some amount of risk. You are en- 
couraged to consult with a certified financial planner prior to making any investment decisions. 
This utilizes the deep learning toolbox’s 1stmLayer layer. We will use part of the time series 
to test the results. 


10.4.3 How It Works 


An LSTM layer learns long-term dependencies between time steps in a time series. It auto- 
matically deweights past data. LSTMs have replaced recursive neural nets (RNNs) in many 
applications. 

StockMarketNeuralNet implements the neural network. The first part creates a mar- 
ket with a single stock. We set the random number generator to its default value, 
rng (’default’ ), so that every time you run the script, you will get the same result. If 
you remove this line, you will get different results each time. The neural network training data 
is the time sequence and the time sequence shifted by one time step. 


StockMarketNeuralNet.m 
1 $$ Script using LSTM to predict stock prices 
2 aa See also: 
3 $ lstmLayer, sequenceInputLayer, fullyConnectedLayer, regressionLayer, 
4 4$ trainingOptions, trainNetwork, predictAndUpdateState 
5 
6 % Rest the random number generator so we always get the same case 
7 rng('default') 
8 
9 layerSet = ‘two lstm'; $ 'lstm' 'bilstm' and 'two lstm” are available 
0 
1 $$ Generate the stock market example 
2 Sal = 1448; 
y mala. = DO 
Al = StockMarket( 1, 8000, 3000, tEnd, n ); 
s 8 -NStocekbrics (asesoras, d sigma, tend, a) 
6 t - linspace(0,tEnd,n+1); 
7 
8 PlotStock(t,s,d.symb) ; 


The stock price is shown in Figure 10.6. We divide the outputs into training and testing 
data. We used the testing data for validation. 
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Figure 10.6: A stock price. 
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StockMarketNeuralNet.m 
2 $$ Divide into training and testing data 
2 Nn = length(s); 
2 nTrain = floor(0.8»n); 
23 sTrain = s(1:nTrain); 
24 sTest = s(nTrain+1:n); 
25 sVal = sTest; 


26 
7 % Normalize the training data 


28 mu = mean(sTrain) ; 

29 sigma = std(sTrain); 

30 

31 sTrainNorm = (sTrain-mu) /sigma; 


32 
33 % Normalize the test data 


% normalize the data to zero mean 


34 sTestNorm = (sTest - mu) / sigma; 


35 sTest = sTestNorm(1:end-1); 


The next part trains the network. We use the **Adam” method [17]. Adam is a first-order 
gradient-based optimization of stochastic objective functions. It is computationally efficient 
and works well with problems with noisy or sparse gradients. See the reference for more details. 
We have a four-layer network including an LSTM layer. 
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$ We are training the LSTM using the previous step 


xTrain = sTrainNorm(1:end-1) ; 
yTrain = sTrainNorm(2:end) ; 


% Validation data 


muVal = mean(sVal); % Must normalize over just this data 
sigmaVal = std(sVal); 

sValNorm = (sVal-muVal)/sigmaVal; 

xVal = sValNorm(1:end-1) ; 

yval = sValNorm(2:end) ; 

numFeatures = Lg 

numResponses = aly 

numHiddenUnits = 200; 


switch layerSet 
case 'lstm' 


layers = [sequenceInputLayer (numFeatures) 


lstmLayer (numHiddenUnits) 


fullyConnectedLayer (numResponses) 


regressionLayer] ; 
case 'bilstm' 


layers - [sequenceInputLayer (numFeatures) 
bilstmLayer (numHiddenUnits) 
fullyConnectedLayer (numResponses) 


regressionLayer]; 
case 'two lstm' 


layers - [sequenceInputLayer (numFeatures) 


lstmLayer (numHiddenUnits) 


reluLayer 


lstmLayer (numHiddenUnits) 


fullyConnectedLayer (numResponses) 


regressionLayer]; 
otherwise 


error('Only 3 sets of layers are available'); 


end 
analyzeNetwork(layers); 


options - trainingOptions('adam', 
'MaxEpochs',300, 
'GradientThreshold',1, 
'InitialLearnRate',0.005, 


'LearnRateDropPeriod',125, 
GuecarnkRatebDropEactorn470-27 
'Shuffle','every-epoch', 
'ValidationData', (xVal,yVal], 


LearnRateSchedule','piecewise', 
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88 'ValidationFrequency',5, 

89 'Verbose',0, 

90 'Plots','training-progress'); 

91 

92 net = trainNetwork(xTrain,yTrain,layers,options); 


The neural net consists of four layers: 


layers - [sequenceInputLayer (numFeatures) 
lstmLayer (numHiddenUnits) 
fullyConnectedLayer (numResponses) 
regressionLayer]; 


This is the minimum set of layers. The layer structure is shown by analyzeNetworkin 
Figure 10.7. analyzeNetwork isn't too interesting for such a simple structure. It is more 
interesting when you have dozens or hundreds of layers. We also provide the option to try a 
BiLSTM layer and two LSTM layers. 


l. sequenceInputLayer (inputSize) defines a sequence input layer. inputSize 
is the size of the input sequence at each time step. In our problem, the sequence is just the 
last value in the time sequence so inputSizeis 1. You could have longer sequences. 


2. lstmLayer (numHiddenUnits) creates a long short-term memory layer. numHid- 
denUnits is the number of hidden units in the layer. The number of hidden units is the 
number of neurons in the layer. 


3. fullyConnectedLayer creates a fully connected layer with specified output size. 


Figure 10.7: Layer structure. 


eco Deep Learning Network Analyzer 


layers 4H 04 00 


Analysis date: 22-Sep-2019 18:41:17 
ANALYSIS RESULT 


sequenceinput Sequence Input 


* sequenceinput 
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4. regressionLayer creates a regression output layer for a neural network. Regression 
is data fitting. 


The learn rate starts with 0.005. It is decreased by a factor of 0.2 every 125 epochs in a piecewise 
manner using these options: 


'InitialLearnRate',0.005, 
'LearnRateSchedule','piecewise', 
'LearnRateDropPeriod',125, 
CnearnBRateDropEactor',0:727 


We let “patience” be inf. This means the learning will continue to the last epoch even 
if no progress is made. The training window is shown in Figure 10.8. The top plot shows the 
root-mean-square error (RMSE) calculated from the data and the bottom plot the loss. We are 
also using the test data for validation. Note that the validation data needs to be normalized with 
its own mean and standard deviation. 

The final part tests the network using predictAndUpdateState. We need to unnor- 
malize the output for plotting. 


Figure 10.8: The training window with 250 iterations. 
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So 


94 $$ Demonstrate the neural net 


95 yPred - predict (net,sTest); 
9 yPred(1) - yTrain(end-1); 

97 yPred(2) - yTrain(end); 

98 yPred = gigma«yPred + mu; 


00 $$ Plot the prediction 

00 NewFigure('Stock prediction') 

0 plot(t(1:nTrain-1),sTrain(1:end-1)); 
03 hold on 

044 plot(t,s,'--g'); 

05 grid on 

06 hold on 

0 k = nTrain+1:n; 

08 plot(t(k),[s(nTrain) yPred],'-') 

0 xlabel ("Year") 

10 ylabel("Stock Price") 

1 title("Forecast") 

12 legend(["Observed" "True" "Forecast"]) 


14 $ Format the ticks 
Hoye = get (geca Vick). 
16 YE cell(1,length(yT)); 


Compare Figure 10.9 with Figure 10.6. The red is the prediction. The prediction reproduces 
the trend of the stock. It gives you an idea of how it might perform. The neural network cannot 
predict the exact stock history but does recreate the overall performance that is expected. 

Results for the BiLSTM layer and two LSTM layers are shown in Figure 10.10. All produce 
acceptable models. 

This chapter demonstrates that an LSTM can produce an internal model that replicates the 
behavior of a system from just observations of the process. In this case, we had a model, but 
in many systems, a model does not exist or has considerable uncertainty in its form. For this 
reason, neural nets can be a powerful tool when working with dynamical systems. We haven't 
tried this with real stocks. Do not use this for predicting real stock performance. 
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Figure 10.9: The prediction with one LSTM layer. 
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Figure 10.10: The top sets are the BiLSTM set and the bottom are the two LSTM layer sets. 
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Image Classification 


11.1 Introduction 


Image classification can be done with pretrained networks. MATLAB makes it easy to access 
and use these networks. This chapter shows you two examples. 


11.2 Using a Pretrained Network 
11.2.1 Problem 


We want to use a pretrained network for image identification. First we will use AlexNet, then 
GoogLeNet. 


11.2.2 Solution 


Install AlexNet and GoogLeNet from the Add-On Explorer. Load some images and test. These 
are classification networks so we will use classify to run them. 


11.2.3 How It Works 


First we need to download the support packages with the Add-On Explorer. If you attempt to 
run alexnet or googlenet without having them installed, you will get a link directly to the 
package in the Add-On Explorer. You will need your MathWorks password. 

AlexNet is a pretrained convolutional neural network (CNN) that has been trained on ap- 
proximately 1.2 million images from the ImageNet data set (http://image-net.org/index). The 
model has 23 layers and can classify images into 1000 object categories. It can be used for all 
sorts of object identification. However, if an object was not in the training, it won’t be able to 
identify the object. 


AlexNetTest.m 


1 $$ Load the network 
2 $ Access the trained model. This is a SeriesNetwork. 
3 net - alexnet; 
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net 


o 


% See details of the architecture 
net .Layers 


oO € BR 


The network layers' printout is shown as follows. 


»» AlexNetTest 
ans - 


25x1 Layer array with layers: 


at ‘data’ Image Input 227x227x3 images with 'zerocenter' normalization 

2 'convi' Convolution 96 11x11x3 convolutions with stride [4 4] and padding [0 
0 0 0] 

3 “relul' ReLU ReLU 

4 'norml' Cross Channel Normalization cross channel normalization with 5 channels per element 

5 pooli? Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0 0 0] 

6 'conv2' Grouped Convolution 2 groups of 128 5x5x48 convolutions with stride [1 1] and 
padding [2 2 2 2] 

7 trelu2" ReLU ReLU 

8 'norm2' Cross Channel Normalization cross channel normalization with 5 channels per element 

9 "DOO Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0 0 0] 

10 'conv3' Convolution 384 3x3x256 convolutions with stride [1 1] and padding [1 
3L. mk a 

alat relus? ReLU ReLU 

12 'conv4' Grouped Convolution 2 groups of 192 3x3x192 convolutions with stride [1 1] and 
padding IE TI H 

13 'relu4' ReLU ReLU 

14 conv Grouped Convolution 2 groups of 128 3x3x192 convolutions with stride [1 1] and 
padding [1 1 1 1] 

15 relus: ReLU ReLU 

16 tpool5? Max Pooling 3x3 max pooling with stride [2 2] and padding [0 0 0 0] 

17 TEGET Fully Connected 4096 fully connected layer 

18 'relu6' ReLU ReLU 

19 'drop6' Dropout 50$ dropout 

20 ca Fully Connected 4096 fully connected layer 

21 "rebut ReLU ReLU 

22 ' drop7' Dropout 50$ dropout 

23 SECO Fully Connected 1000 fully connected layer 

24 ‘prob’ Softmax softmax 

25 “Output? Classification Output crossentropyex with 'tench' and 999 other classes 


There are many layers in this convolutional network. ReLU and Softmax are the activa- 
tion functions. In the first layer, **zerocenter" normalization is used. This means the images 
are normalized to have a mean of zero and a standard deviation of 1. Two layers are new, 
cross-channel normalization and grouped Convolution. Filter groups, also known as grouped 
convolution, were introduced with AlexNet in 2012. You can think of the output of each filter 
as a channel and filter groups as groups of the channels. Filter groups allowed more efficient 
parallelization across GPUs. They also improved performance. Cross-channel normalization 
normalizes across channels, instead of one channel at a time. We’ve discussed convolution in 
Chapter 3. The weights in each filter are determined during training. Dropout is a layer that 
ignores nodes, randomly, when training the weights. This prevents interdependencies between 
nodes. 

For our first example, we load an image that comes with MATLAB, of a set of peppers. We 
use the top left corner as input to the net. Note that each pretrained network has a fixed input 
image size that we can determine from the first layer. 


220 


CHAPTER 11M IMAGE CLASSIFICATION 


AlexNetTest.m 


O oa qa g m E ES 


3% Load a test image and classify it 
% Read the image to classify 
I = imread('peppers.png'); % ships with MATLAB 


% Adjust size of the image to the net's input layer 
sz = net.Layers(1).InputSize; 
Ls JE (ileal) db St (021)) , Laca En) e 


$ Classify the image using AlexNet 
[label, scorePeppers] = classify(net, I); 


$ Show the image and the classification results 
NewFigure('Pepper'); ax = gca; 

imshow (I); 

title(ax,label); 


PlotSet (1:length(scorePeppers) scorePeppers,'x label','Category',... 


trap edis concu otttele P lorca), 


The images and results for the AlexNet example are shown in Figure 11.1. The pepper 


scores are tightly clustered. 
For fun, and to learn more about this network, we print out the categories that had next 
highest scores, sorted from high to low. The categories are stored in the last layer of the net in 


its Classes. 


Figure 11.1: Test image labeled with the classification and the scores. The image is classified 
as a “bell pepper." 


Peppers 


0.8 T 


bell pepper 


0 1 1 1 1 L 
0 100 200 300 400 500 


Category 


221 


900 


1000 


CHAPTER 11M IMAGE CLASSIFICATION 


AlexNetTest.m 

19 % What other categories are similar? 

20 disp('Categories with highest scores for Peppers:') 
21 kPos - find(scorePeppers>0.01); 


2 [vals,kSort] = sort(scorePeppers(kPos),'descend'); 

23 for k = 1:length(kSort) 

24 fprintf (’%13s:\t%g\n’ ,net.Layers (end) .Classes (kPos (kSort (k) )) , vals (k) 
NF 

2 end 


The results show that the net was considering all fruits and vegetables! The Granny Smith 
had the next highest scores, followed by cucumber, while the fig and lemon had much smaller 
scores. This makes sense since Granny Smiths and cucumbers are also usually green. 


Categories with highest scores for Peppers: 


bell pepper: 0.700013 
Granny Smith: 0.180637 
cucumber: 0.0435253 
fig: 0.0144056 
lemon:  0.0100655 


We also have two of our own test images. One is of a cat and one of a metal box, shown in 


Figure 11.2. 
The scores for the cat classification are shown as follows. 


Categories with highest scores for Cat: 
tabby: 0.805644 
Egyptian cat: (0) - alise 72 
tiger cat: 0.0338047 


The selected label is tabby. It is clear that the net can recognize that the photo is of a cat, as 
the other highest scored categories are also kinds of cats. Although what a tiger cat might be, 
as distinguished from a tabby, we can't say... 


Figure 11.2: Raw test images Cat.png and Box jpg. 
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Figure 11.3: Test images and the classification by AlexNet. They are classified as ‘‘tabby” and 
“hard disc.” 


tabby hard disc 
7 UEM. | eA SS 


The metal box proves the biggest challenge to the net. The category scores above 0.05 are 
shown as follows and the images with their label are shown in Figure 11.3. 


Categories with highest scores for Box: 
Naradakdise: m 9599 


loupe: 0.0731844 
modem:  0.0702888 
pick: 0.0610284 
iPod: 0.0595867 

CD player:  0.0508571 


In this case, the hard disc is by far the highest score, but the score is much lower than that 
of the tabby cat—roughly 0.3 vs. 0.8. The summary of scores is 


AlexNet results summary: 


Pepper 0.7000 
Cat 0.8056 
Box 02:95 


Now let's compare these results to GoogLeNet. GoogLeNet is a pretrained model that has 
also been trained on a subset of the ImageNet database which is used in the ImageNet Large- 
Scale Visual Recognition Challenge (ILSVRC). The model is trained on more than a million 
images, has 144 layers (a lot more than the AlexNet), and can classify images into 1000 object 
categories. First we load the pretrained network as before. 


GoogleNetTest.m 


1 $$ Load the pretrained network 
2 net = googlenet; 
3 net % display the 144 layer network 
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The net display is shown as follows. It is a different type than AlexNet, a DAGNetwork: 


net = 
DAGNetwork with properties: 


Layers: [144x1 nnet.cnn.layer.Layer] 
Connections: [170x2 table] 


Next we test it on the image of peppers. 


GoogleNetTest.m 


% The pepper 

Read the image to classify 

I = imread('peppers.png'); 

sz = net.Layers(1).InputSize; 

dp = AL (AS (ab eat) ES S ZA (030) eo 

[label, scorePeppers] = classify(net, I); 

NewFigure('Pepper'); 

imshow(I); 

title(label); 

$ What other categories are similar? 

disp('Categories with highest scores for Peppers:') 

kPos = find(scorePeppers>0.01); 

[vals,kSort] = sort(scorePeppers(kPos) ,’descend’ ) ; 

for k = 1:length(kSort) 

fprintf (’%13s:\t%g\n’ ,net.Layers (end) .Classes (kPos (kSort (k) )) , vals (k) 
); 


oo oo 


XO! 100° SD Uc as DS ONO AO 


t2 
eo 


21 end 


As before, the image is correctly identified as having a bell pepper, and the score is similar 
to AlexNet. However, the remaining categories are a little different. In this case, the cucumber, 
and for some reason, a maraca, scored higher than a Granny Smith. Maracas are also round and 
oblong. The highest categories are shown as follows. 


Categories with highest scores for Peppers: 
bell pepper: 0.708213 


cucumber:  0.0955994 
maraca: 0105089318 
Granny Smith: 0.0278589 


We also test this net on the images of the cat and box. The image size for this network is 
224x224. The categories for the cat are the same, with the addition of a lynx, and note that the 
tabby score is significantly lower than for AlexNet. 


Categories with highest scores for Cat: 


tabby: 0.532261 
Egyptian cat: 0233229 
tiger cat: 0.0790764 
lynx: OSOS 2757. 
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Figure 11.4: GoogLeNet scores for cat, left, and box, right. 


Tabby Cats Box 
0.6 r T r 03 l 
0.5 f J 0.25 + E 
04- E 02 | 
g 2 
8 0.3 F 4 8015 | 
a [7] 
02r | 01L | 
04r 4 0.05 + 4 
0 1 i 1 i n n n n n 0 L 1 1 1 it 1l 
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 
Category Category 


The box scores prove the most interesting, and while hard disc is among the highest scores, 
in this case the net returns iPod. A cellular telephone is added to the mix this time. The net 
clearly knows that it is a rectangular metal object, but beyond that there is no clear evidence for 
one category over another. 


Categories with highest scores for Box: 


iPod: 0.443666 

hard disc: 0.212672 
cellular telephone: 0.0787301 
modem: 0.0766429 
pick: 0.0545631 

switch: 070169888 
scale: 0.0165957 

remote control: 0.0154203 


The GoogLeNet score arrays for cat and box are shown in Figure 11.4. The box scores are 
visibly spread all over the place. This reinforces that the choice of *'ipod" is less certain than 
the pepper or cat. This shows that even highly trained networks are not necessarily reliable if 
the input strays too far from the test set. 


The summary of the GoogleNet results are: 


GoogleNet results summary: 


Pepper QUIS 
Cat QNT S 
Box 0.4437 


We can also grab random images from the Internet. The site https://picsum.photos calls 
itself the **Lorem Ipsum”” for photos, and provides a random photo with every call to the URL. 
Four examples are shown in Figure 11.5. 
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Figure 11.5: Volcano, lakeside, seashore and geyser. 


volcano lakeside seashore geyser 


Figure 11.6: Author headshots with GoogLeNet labels. 


jersey 


Consider, for example 


>> I = imread('https://picsum.photos/224/224'); 
>> figure, imshow(I); 
>> title(classify(net,1)) 


We got some interesting results using this web site. It produces good results for some 
landscape photos, but other times sees objects that are not there, see Figure 11.5. 

These nets are not trained on people; however it can be interesting to test them on images 
of people. We tested GoogLeNet on our author headshots in Figure 11.6. In both cases, it 
identified our clothing fairly accurately! 

While these nets perform very well on images that do in fact exist in their database, from 
lions to landscapes, it is important to remember that they are limited in application. Results can 
be unexpected and even silly. 
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Orbit Determination 


12.1 Introduction 


Determining orbits from measurements has been done for hundreds of years. The general ap- 
proach is to take a series of measurements of the object from the ground. This is a set of angles 
at different times. Given the location on the Earth, and this set of data, one can reconstruct the 
orbit. Ideal orbits, which make the assumption that the Earth's gravity is a point at the center of 
the Earth, are conic sections. Those that stay near the Earth are ellipses. These can be defined 
as a set of orbital elements. In this chapter, we will design a neural network to find the values 
for two of the elements. Our model will be simpler than that which astronomers must use. We 
will assume that all of our orbits are in the Earth’s equatorial plane and that the observer is at 
the center of the Earth. 

The purpose of this chapter is to show that a neural net can do orbit determination. For 
comparison with traditional methods, see the classic textbook from 1965 by Escobal [11]. 


12.2 Generating the Orbits 
12.2.1 Problem 


We want to create a set of orbits for testing and training a neural net. 


12.2.2 Solution 


Implement a random orbit generator using Keplerian propagation of elements. 


12.2.3 How It Works 


An orbit involves at least two bodies, for example, a planet and a spacecraft. In the ideal two- 
body case, the two bodies rotate about the common center of mass, known as the barycenter. 
For all practical spacecraft cases, the spacecraft mass itself is negligible, and this means that the 
satellite follows a conic section path about the primary body’s center of mass. A conic section 
is a curve that fits on a cone, as shown in Figure 12.1. Two conics, a circle and ellipse, are 
drawn. Hyperbolas and parabolas are also conic sections, but we will only look at elliptical 
orbits in this chapter. 
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Figure 12.1: Ellipse and circle on a cone and viewed along their normal. 


The code 


Circle 
Ellipse 


-0.5 H \ / 


=] i 
0.5 1 15 2 
y 


that draws this picture is in the following script. It calls two functions, Cone and 


ConicSectionEllipse. r0 and h are only needed to draw the cone. The algorithm only 
cares about theta, the cone half angle. 


ConicSection.m 


VD 0 IO là Ah QU I) — O wv o -10 WN A Q0 t — 


20 


theta = pi/4; 

h MEAT. 

r0 = hxsin (theta); 

ang = linspace(0,2xpi); 

a EE 

b = 13 

cA = cos (ang); 

sA = sin(ang) ; 

n = length(cA) ; 

c = 0.5«h«sin(theta)*[cA;sA;ones(1,n)]; 
e = [axcA;b*sA;zeros(1,n)]; 

$ Show a planar representation 
NewFigure('Orbits'); 

ploe le m 0 c2 EU) 

hold on 

plot (e(1,:),e(2,:),’g') 

grid 

xlabel('x') 

ylabel('y') 

axis image 

legend('Circle','Ellipse'); 

iZ aset] - ConicSectionEllipse(a,b,theta); 
ang = pi/2 + phi; 

e = [cos(ang) 0 sin(ang);0 1 0; -sin(ang) 0 cos(ang)]«e; 
Ef, 2) = GL. a) 6 sen 

e378) = xe a) xh dap = ws 


Cone (r0,h,40); 
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322 hold on 

33 plot3(c(1,:),c(2,:),2«xones(1,n),'r','linewidth',2); 

sa plotsi(e (ie e SS mepaedmewsdt2) 

35 line([x x],[-b bl, [h-z h-z],'color','g','linewidth',2); 
36 view([0 1 0]) 


The view is set to look along the y-axis which is the axis of rotation for the ellipse. The function 
Cone draws the cone. 1ine draws the axis of rotation that is along the short axis. 

The solution that is used to draw the conic sections is derived in the last section of this chap- 
ter. The orbit may be elliptical, with an eccentricity less than 1, parabolic with an eccentricity 
equal to 1, or hyperbolic with an eccentricity greater than 1. Figure 12.2 shows the geometry 
of an elliptical orbit. This is a planar orbit in which the orbital motion is two-dimensional. The 
semi-major axis a 1s 

Ta Tp 


a= =2 (12.1) 


where ra is the apoapsis (apogee for the Earth) radius, or point furthest from the central planet, 
and rp is the periapsis radius (perigee for the Earth), or closest point to the planet. The eccen- 
tricity, e, of the orbit is 

e= Tu p (12.2) 

Ta + Tp 

When ra = rp, the orbit is circular and e = 0. This formula is not meaningful for parabolic 
or hyperbolic orbits. Figure 12.2 shows three angular measurements, M mean anomaly, E 
eccentric anomaly, and y true anomaly. All are measured from periapsis. The mean anomaly is 


Figure 12.2: Elliptical orbit. 


Apoapsis 


229 


CHAPTER 12M ORBIT DETERMINATION 


related to the mean orbit rate n through a simple function of time. 
M = Mo + n(t — to) (12.3) 


The eccentric anomaly is the angle to the current position as projected onto the ellipse’s circum- 
scribing circle, drawn in blue. It is related to the mean anomaly by Kepler’s equation. 


M=E-esinE (12.4) 


This equation needs to be solved numerically in general, but for small values of e, e < 0.1, this 
approximation can be used. 


1 
Ex M +esin M + ~ sin 2M (12.5) 


This is because apoapsis is not well defined for very small e. Higher-order formulas can also 
be found. The true anomaly is related to the eccentric anomaly through the equation 


V l+e E 
L =,/—— = 12. 
tan 5 Toe "3 (12.6) 


= —— 12.7 
r 1 + ecos v ( ) 


Finally, the orbit radius is 


If e > 1 in this equation, r will go to oo, as is expected for parabolic or hyperbolic orbits. 
Seven parameters are necessary to define an orbit of a spacecraft about a spherically sym- 
metric body. One is the gravitational parameter, generally denoted by the symbol y. The 


gravitational parameter is 
u= G(m, + m3) (12.8) 


where m; is the mass central body and ma is the mass of the orbiting body. G is the gravitational 
constant with units of m?/kg/s?. For the Earth, G = 6.6774 x 10711. y for the Earth is 
3.98600436 x 10° m?/s?. There are many ways of representing the other six elements. The two 
most popular sets are position and velocity (r and v) vectors, and Keplerian orbital elements. 
Each representation uses six independent variables to describe the orbit, plus u. Both are shown 
in Figure 12.3. 

The Keplerian elements are defined as follows. Two elements define the elliptical orbit. 
The size of the orbit is determined by the semi-major axis a, which is the average of the perigee 
radius and apogee radius. The size and shape of the orbit are defined by the eccentricity, e. 
Two elements define the orbital plane. Q is the longitude which is the right ascension of the 
ascending node, or the angle from the +X axis of the reference frame to the line where the 
orbit plane intersects the xy-plane. i is the inclination and is the angle between the ry-plane 
and the orbit plane. w is the argument of perigee and is the angle in the orbit plane between the 
ascending node line and perigee (where the orbit is closest to the center of the central body). v 
is the true anomaly and is the angle between perigee and the spacecraft. The mean anomaly M 
may be used in the element set instead of y. M or y tells us where the spacecraft is in its orbit. 
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Figure 12.3: Orbital elements. The underlying plot was drawn using DrawEllipticOrbit. 
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To summarize, the Keplerian elements are 
a 
i 
Q 
y= 
w 
e 
M 


The orbit period, with units of seconds, is 


3 
P= 2 E 
m 


The orbit parameter, with units of distance (conventionally km), is 
p—a(1 — e) 4 e) 


The in-plane position and velocity are 


COS y 
T "— — sin y 
p 1 + ecos v 
0 
— sin y 
R € + COSY 
p 0 


231 


—Yta 
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(12.10) 


(12.11) 


(12.12) 


(12.13) 
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The transformation matrix from planar to three-dimensional coordinates is 


cos f) cos. — sin A sinwcosi —cosQsinw — sin Q cos w cos i sin Q sin 
c= | sinÚcosw 4- cosQsinw cosi. —sinQsinw + cosQcoswcosit | — cos sini 
sin w sin 4 COS W Sin 2 COS 2 
(12:14) 
That is 
r = Clg (12.15) 
U = Cp (12.16) 


For the purposes of creating the neural net, we will look at orbits with the inclination, 7 = 0, 
and the ascending node, Q = 0. The transformation matrix reduces to a rotation about z. 


cosw —sinw 0 
c= | sinw cosw 0 (12.17) 
0 0 1 


We now want to propagate the orbit forward in time. There are two alternative approaches 
for doing so. One approach is to use Keplerian propagation, where we keep five of the elements 
constant, and simply march the mean anomaly forward in time at a constant rate of n = 4//a?. 
At each point in time, we can convert the set of six orbital elements into a new position and 
velocity. This approach is limited though, in that it assumes the orbit follows a Keplerian 
orbit (the only external foci is the gravitation of a central body with uniform mass distribution). 
The second approach, which gives us more flexibility, for external forces like thrust and drag 
is to numerically integrate the dynamic equations of motion. The state equations for orbit 
propagation are 


ù = -up +e (12.18) 
E (12.19) 


The terms on the right-hand side of the velocity derivative equation are the point mass gravity 
acceleration with additional acceleration a. This is implemented in RHSOrbit. 


function xDot - RHSOrbit(^,x,d) 


1 
D 

as S Slabs aA 

Be SH = alero 

5 xDot = [v;-d.musr/(r’*r)*1.5 + d.al; 


We will create a script that simulates multiple orbits. The simulation will use RHSOrbit. 
The first part of the orbit generation script sets up the random orbital elements. 
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Orbits.m 


Generate Orbits for angles-only element estimation 
Saves a mat-file called OrbitData. 

See also 
El2RV, RungeKutta, RHSOrbit, TimeLabel, PlotSet 


oe 


oo oo op opo 
oo 


Mey eae ES ONE CA BUTS US SUITS 
fa) 
GI 
Lr 
Il 


500; % Number of sets of data 
d MS BUE e % Initialize 
d.mu = 3.98600436e5; % Gravitational parameter, km^3/s^2 
d.a = [Oa] 5 % Perturbing acceleration 


11 $ Random elements 

= 0.6xrand(1,nEl); 

8000 + 1000*randn(1,nE1) ; 
= 0.25*pixrand(1,nE1) ; 


Eccentricity 
Semi-major axis 
Mean anomaly 


ap Je oo 


o 
moo 
H 


The next section runs the simulations and saves the angles. Each simulation has 2000 steps, 
and each step is 2 seconds. We are only using one in ten points for the orbit determination. 
We save the orbital elements for testing the neural network. We are not applying any external 
acceleration. We could have used Kepler propagation, but by simulating the orbit, we have the 
option for studying how well the neural network performs with disturbances. 


$ Set up the simulation 
nSim = 2000; $ Number of simulation steps 
dT = 2; $ Time step 


Only use some of the sim steps 
jUse = 1:10:nSim; 

% Data for Deep Learning 

data = cell(nEl,1); 


o6 0 -1 DHA WN - 
x 


1 $$ Simulate each of the orbits 


far Sk = zeros(4,nSim) ; 
12r uf. = (0: (nSim-1))*xdT; 
ie e CBS = Errar ad OL AS ip ap Melee eq S I uc aav 


16 for k= nel 


17 [r,v] = El2RV([a(k) 0 0 O e(k) M(k)]); 

18 x Satin 

19 xP = zeros(4,nSim); 

20 for j = 1:nSim 

21 SC UP EO 

22 x = RungeKutta( @RHSOrbit, 0, x, dT, d ); 
23 end 

24 data{k} = atan2(xP(2,jUse) ,xP(1,jUse)) ; 
25 el(k).a SMA 

26 el (k) .e Ex XS 

?) end 
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Figure 12.4: The last test orbit. The measured angle is on the right. These are only showing the 
data used in orbit determination. 
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The final part plots the orbits and saves the data to a file. 


1 $$ Save for the Deep Learning algorithm 
2 save('OrbitData','data','el'); 


The last orbit is shown in Figure 12.4. The jump in angle is due to angles being defined 
from —7 to +7. We could have used unwrap to get rid of this jump. We are only measuring 
for part of an orbit. We can set up the simulation to measure any part of an orbit, or even 


multiple orbits. 


12.3 Training and Testing 
12.3.1 Problem 


We want to build a deep learning system to compute the eccentricity and semi-major axis for 
an orbit from angle measurements. 


12.3.2 Solution 


The orbit history is a time series of angles. We will take angles at uniform time intervals. We 
will use £itnet to fit the data. 
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12.3.3 How It Works 


We load in the data from the mat file and separate it into training and testing sets. 


OrbitNeuralNet.m 

1 $$ Train and test the Orbit Neural Net 
2 $$ See also: 

3 $ Orbits, fitnet, configure, train, sim, cascadeforwardnet, 

feedforwardnet 

4 

Ss = = load('OrbitData'); 

6 n = length(s.data) ; 

7 nTrain = floor(0.9«n); 

8 

9 %% Set up the training and test sets 
10 kTrain = randperm(n,nTrain) ; 

11 STrain = s.data(kTrain) ; 

12 nSamp = size(sTrain{1},2); 

13 xTrain = zeros(nSamp,nTrain) ; 

14 aMean = mean([s.el(:).al); 


16 for k = 1:nTrain 


17 xTrain(:,k) = sTrain{k}(1,:); 

18 end 

19 

20 elTrain = s.el(kTrain); 

20 yTrain = [elTrain.a;elTrain.e]; 
2  yTrain(1,:) = yTrain(1,:)/aMean; 


23 % Normalize the data so it is the same magnetic as the eccentricity 
24 kTest = setdiff(1:n,kTrain); 


25 sTest = s.data(kTest) ; 

26 nTest = n-nTrain; 

27 xTest = zeros(nSamp,nTest) ; 
23 fork = 1:nTest 

29 xTest(:,k) = sTest{k}(1,:); 

30 end 

31 

32 elTest = s.el (kTest); 

33 yTest = [elTest.a;elTest.e]; 
34 yTest(1,:) = yTest(1,:)/aMean; 


The neural network will use sequences of angles and their related times as the input. The 
output will be the two orbital elements: semi-major axis and eccentricity. In general, if we know 
the position and velocity at a point in the orbit, we can always compute the orbital elements. 
This is done in the function E12RV. Although we don't directly measure velocity, it can be 
estimated by differencing position measurements. With angle-only measurements, we don't 
have a measure of range. The question is, can the neural network infer the range from the time 
variation of the angles? 

We train the network using fitnet. Note that we normalized the semi-major axis so that 
the magnitude is the same order as the eccentricity. This improves the fitting. 
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1 $$ Train the network 

2 net - fitnet(10); 

3 

4 net = conftigure(net, xTIraim, ylrain); 
y net name = Orbit, 

6 net -Xtradnnet AEri BRAE, 


We use the test data to test the network. 


oo 
oo 


Test the network 


2. yPred - gim(net,xTest); 

3 yPred(1,:) = yPred(1,:)xaMean; 

4 yTest(1,:) = yTest(1,:)*aMean; 

5 yM - mean(yPred-yTest,2); 

6 yTM - mean(yTest,2); 

7 fprintf('NnFit Net\n’); 

8 fprintf('Mean semi-major axis error $12.4f (km) $12.2f %%\n’,yM(1),100* 


abs (yM(1))/yTM(1)); 

9 fprintf('Mean eccentricity error 312.4£ $12.2£ %%\n',yM(2),100+* 
abs (yM(2)) /yTM(2)); 

10 $$ Plot the results 

iy edm Us p 

2 yLeg = ('Predicted','True'); 

13 PlotSet(1:nTest, [yPred;yTest],'x label','Test','y label',yL,... 

ME igure titlen Eredictionsmusurncgiitnuec “ple: sc tv <l (2 di P" 

15 'legend',[yLeg yLeg]); 


"d 


The results are best for £itnet. However, the results will vary with each run. 


1 >> OrbitNeuralNet 
2 >> OrbitNeuralNet 


3 
4 Fit Net 

5 Mean semi-major axis error 31.9872 (km) 0.41 % 
6 Mean eccentricity error 0.0067 2.48 $ 
7 

8 Cascade Forward Net 

9 Mean semi-major axis error -89.8603 (km) ake lbs: £5 
10 Mean eccentricity error -0.0100 3.74 € 
11 

12 Feed Forward Net 

13 Mean semi-major axis error 40.2986 (km) 0.52 4 
14 Mean eccentricity error 0.0001 0.03 $ 


Figures 12.5, 12.6, and 12.7 show the test results. Both semi-major axis and eccentricity 
results are reasonably good. You can experiment with different spans of data and different 
sampling intervals. The code is in the script OrbitNeuralNet .m. 
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Figure 12.5: Test results using fitnet. 
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Figure 12.6: Test results using cascadeforwardnet. 
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Figure 12.7: Test results using feedforwardnet. 
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We train the network using cascadeforwardnet. The code doesn’t change except for 
the function name. 


1 %% Train the cascade forward network 


2 net = cascadeforwardnet (10); 

3 

4 net = configure(net, xTrain, yTrain) ; 
5 net.name = 'Orbit'; 

6 net < train (net xiran yirain), 


We finally train it using feedforwardnet. 


1 $$ Train the feed forward network 


2 net = feedforwardnet (10); 

3 

4 net = configure (net, XTrain, yTrain); 
5 net.name “Orbit! > 

6 net = train(net,xTrain,yTrain); 
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12.4 Implementing an LSTM 
12.4.4 Problem 


We want to build a long short-term memory neural net (LSTM) to estimate the orbital elements. 
LSTMs have been demonstrated in previous chapters. They are an alternative to the functions 
shown earlier. 


12.4.2 Solution 


The orbit history is a time series of angles. We will use a bidirectional LSTM to fit the data. 
We will take angles at uniform time intervals. 


12.4.3 How It Works 


We load in the data from the mat file and separate it into training and testing sets. The data 
format is different from the feedforward networks. xTrain is a cell array, but yTrain is a 
matrix with a row for each cell array in xTrain. 


OrbitLSTM.m 


Script to train and test the Orbit LSTM 

It will estimate the orbit semi-major axis and eccentricity from a time 

sequence of angle measurements. 

$ See also 

Orbits, sequenceInputLayer, bilstmLayer, dropoutLayer, 
fullyConnectedLayer, 


oe 


un £z w N rm 
9e oo oo oo oo 


6 % regressionLayer, trainingOptions, trainNetwork, predict 
o 

8 Ss = load('OrbitData'); 

OE = length(s.data) ; 

10 nTrain = floor(0.9x*n); 


12 $$ Set up the training and test sets 


jr Errata = randperm(n,nTrain) ; 

14 aMean = mean([s.el(:).a]); 

15 xTrain = s.data(kTrain) ; 

1 nTest = n-nTrain; 

17 

i8 elTrain = s.el(kTrain); 

19 yTrain = [elTrain.a;elTrain.e]'; 
2 yTrain(:,1) = yTrain(:,1)/aMean; 

21 kTest = setdiff(1:n,kTrain) ; 
2 xTest = s.data(kTest) ; 

23 

24 elTest = s.el (kTest); 

25 yTest = [elTest.a;elTest.e]'; 
25 yTest(:,1) = yTest(:,1)/aMean; 
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We train the network using trainNetwork. 


3% Train the network with validation 
numFeatures ze 
numHiddenUnitsl = LO 
numHiddenUnits2 OO 

numClasses = E 


layers = [ 
sequenceInputLayer (numFeatures) 
bilstmLayer (numHiddenUnits1, ’OutputMode’ , ‘sequence’ ) 
dropoutLayer (0.2) 
bilstmLayer (numHiddenUnits2,'OutputMode','last') 
fullyConnectedLayer (numClasses) 
regressionLayer] 


maxEpochs = 20; 


options = trainingOptions(’adam’, 
'ExecutionEnvironment','cpu', 
'GradientThreshold',1, 

20 'MaxEpochs',maxEpochs, 

21 'Shuffle','every-epoch', 

22 'ValidationData',(xTest,yTest], 

23 'ValidationFrequency',5, 

24 'Verbose',0, 

25 'Plots','training-progress'); 


o 0 -0 O UA R QD I) — O vo o - DWH B Qo M BS 


27 net = trainNetwork(xTrain,yTrain,layers,options); 


options is given validation data. Note the cell array that is required for the validation 
data. 


1 'ValidationData',(xTest,yTest], 
We shuffle the data. This generally improves the results since the learning algorithm sees 
the data in a different order on each epoch. We use the test data to test the network. predict 


produces results based on the test data. This is the same data used for validation during learning. 
The results are given as follows. 
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>> OrbitLSTM 


1 
2 layers = 

3 6x1 Layer array with layers: 

4 

5 aL fad Sequence Input Sequence input with 1 dimensions 
6 2 o BiLSTM BiLSTM with 100 hidden units 

7 3 e Dropout 20% dropout 

8 4 du BiLSTM BiLSTM with 100 hidden units 

9 5 Un Fully Connected 2 fully connected layer 

10 6 UH Regression Output mean-squared-error 

11 

12 biLSTM 

13 Mean semi-major axis error -63.4780 (km) 

14 Mean eccentricity error 0.0024 


We use two BiLSTM layers with a 20% dropout between layers. Dropout removes neurons 
and helps prevent overfitting. Overfitting is when the results correspond too closely toward a 
particular set of data. This makes it hard for the trained network to identify patterns in new 
data. The first BILSTM layer produces a sequence as its output. The second BiLSTM layer’s 
'OutputMode' is set to ‘last’. The numClasses is 2 because we are estimating two 
parameters. The fully connected layer connects the two BiLSTM outputs to the two parameters 
we want to identify in the regression layer. The training window is shown in Figure 12.8. We 
could have continued the training for more epochs as the root-mean-square error (RMSE) is 
still improving. 

This particular set of layers is to show you how to build a neural network. It is by no means 
the **best" architecture for this problem. We did try a single LSTM layer and a single BiLSTM 
layer worked better. 

Figure 12.9 shows the test results. The results are not quite as good as the feedforward 
nets given earlier. We've only used two layers. From Chapter 11, you see that *'professional" 
networks can have dozens if not hundreds of layers. The difference is due to the smaller number 
of neurons in the LSTM. You can experiment with this network to improve the results. 

In this chapter, we have compared two approaches, in MATLAB, to solving the orbit de- 
termination problem. Using the MATLAB functions worked a bit better than the LSTM we 
implemented. We made the argument of perigee constant to make the problem easier. The next 
step would be to try and find the full set of orbital elements and then try to design a system that 
works from a fixed point on the Earth. In the latter case, we would need to account for the ro- 
tation of the Earth. Another improvement would be to take the measurements at different time 
steps. For an elliptical orbit, taking many measurements at perigee is more productive than at 
apogee because the spacecraft is moving faster. One could write a preprocessor to select inputs 
to our neural network based on the angular change with respect to time. Orbit determination 
systems, using algorithmic approaches, can also compute errors in the observer's location. You 
could also try other measurements, such as range and range rate. These measurements are used 
for deep space and geosynchronous spacecraft. 
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Figure 12.8: Training window. 
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12.5 Conic Sections 


For a given ellipse and cone, we need to solve for the location of the center of the plane that 
cuts the cone and its angle. The problem can be solved by working in the zy-plane. This is 
shown in Figure 12.10. The equation for an ellipse is 


2 2 
x y o 
E + rim 1 (12.20) 
The equation for a right circular cone is 
gt yas (12.21) 


Figure 12.10: Conic section. 
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where ^ = cos 0 and 0 is the cone half angle. This is just saying that the radius of the cone at 
position z is yz. The equation for a plane is 


2=h (12.22) 


That is, z is constant for all x and y. If we rotate this about the y-axis by o and translate it by xo, 
we get the equation for the cone. The ellipse comes from the intersection of the plane with the 
cone. bis along the y-axis. The angle between the plane that cuts the cone and the xy-plane is 
a and is a function of a, b, and the cone half angle 0. The following equations are for 0 = 7/4. 


2 
tan? a 21— a (12.23) 
a 


Noting that 
ee 5 E (12.24) 


The relationship between the plane and the vertical for a cone with a half angle of 7 is then 


2 
de 5 — atan4/1 — ^ (12.25) 


The cone can be viewed in the plane. On the right side, the equation is 
L= yu (12.26) 
where y = cos 0, where 0 is the cone half angle. On the left side 
T= —yc (12.27) 


We then write the equations for the line along the major axis of the ellipse on each side of the 
triangle. On the right 


x £o +acosa (12.28) 

z = h-asina (12.29) 
where o = 7/2 — ¢. On the left 

T = z9—acosc (12.30) 

z = h+asina (12.31) 


Substituting into the equations for the cone, we get 


dant. EO Mess — sin a — cosa 
| ly | | h | =a] cosa — ysin o (12.32) 


The code that solves the equations follows. We could have solved the inverse analytically since 
it is 2 by 2. 
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ConicSectionEllipse.m 


1 function [h,phi,x] = ConicSectionEllipse(a,b, theta) 
2 

3 if( nargin « 1 ) 

4 [h, phi, y] = ConicSectionEllipse(2,1,pi/4) 
5 fprintf( = $12.4fin',h); 

6 fprintf('phi = $12.4f (rad)\n’,phi); 

7 fprintf('x = Ele E Wat SIDE 

8 clear h 

9 return 

10 end 

11 

12 phi - pi/2 - atan(sqrt(1-b^2/a^2)) 


14 alpha = pi/2 - phi; 


ise = cos (alpha); 

16 Ss = sin(alpha); 

17 gamma = cos(theta); 

iji E = ax [-gammaxs - c;c - gammaxs]; 
9 q = [1 -gamma;1 gamma] \f; 

A = Cb) g 

2 h ete 
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batchNormalizationLayer, 46 Deep Learn Toolbox, 28 
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Cross-entropy loss, 49 generate images, 51-55 
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Ellipses and circles (cont.) 
train and test, 55—62 
ELM, see Extreme learning machine 
(ELM) 
Euler’s equation, 118 
Exclusive-or (XOR), 2, 9 
activation function, 11 
DLXOR.m script, 28-29 
feedforwardnet, 37 
Gaussian noise, 37 
GUI, 29-30 
hidden layers, 35, 36 
mean output error, 15, 16 
network training 
histogram, 33 
performance, 31 
state, 32 
neural net, 35 
regression, 34 
tansig, 35 


truth table and solution networks, 10 


weights, expand, 12 

XORDemo, 11, 14 

XOR.m, 10-11 

XORTraining.m, 12-13 
Extreme learning machine (ELM), 19 


F 

Fault detection simulation 
detection filter, 86 
DetectionFilterSim, 84, 85 
failed tachometer, 87 
regulator, fail, 85, 86 

fminsearch, 173 

fullyConnectedLayer, 48, 213 


G 
Generative Deep Learning, 20, 157 


H 
Handwriting analysis, 20 
Hessian matrix, 37, 38 


I 

Image classification, 217 

Image recognition, 20 

IMU belt, 146-147 

Inertial Measurement Unit (IMU), 
117-118 

International Tokamak Experimental 
Reactor (ITER), 91 


J, K 
Joint European Torus (JET), 95 


L 


INDEX 


Levenberg Marquardt training algorithm, 37 
Long short-term memory (LSTM) network, 


19, 210, 239 
IstmLayer (numHiddenUnits), 213 
Lumped parameter model, 94 


M 

Machine learning, types, 2 
Machine translation, 3, 20 
Magnetohydrodynamic (MHD), 92 
MatConvNet, 28 

MathWorks products 


Computer Vision System Toolbox, 27 


Deep Learning toolbox, 26 
Image Acquisition Toolbox, 27 
Instrument Control Toolbox, 26 
Parallel Computing Toolbox, 27 
Statistics and Machine Learning 
Toolbox, 26 
Text Analytics Toolbox, 27 
visualization tools, 25 
Movie database 
characteristics, 66 
function demo, 67 
generate, 65—68 
viewer database, 71 
Movie watchers 
generate, 68—70 
training window, 74, 76 
Multilayer network, 1-3 
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N 
Neural nets 
neuron, 4 

activation functions, 5, 6 
LinearNeuron.m., 6, 7 
threshold function, 7 
two input, 4 

Neural network research, 1 


O 
Open source tools, 27-28 
Orbit determination 
conic sections, 243-245 
generation 
Elliptical orbit, 229 
Keplerian elements, 230-233 
orbital motion, 229 
test orbit, 234 
theta, 228 
two conics, a circle and ellipse, 227 
LSTM, implementation, 239-242 
test results, 242 
training window, 242 
validation data, 240 
xTrain, 239-240 
training and testing, 235-238 


P 
Patternnet network, 73 
input/output, 75 
training window, 76 
Pattern Recognition and Machine Learning 
Toolbox (PRMLT), 28 
Pirouette, 115 
baseball pitcher”s pitch, 116 
center of mass, dancer, 119 
classification, 149-150 
bilstmLayer, 152 
DancerNN.m, 150-151 
neural net training, 154 
testing neural network, 153 
data acquisition 
BluetoothTest.m, 124-125 


communication state status, 122 
instrumental control toolbox, 121 
Mac dongle, 121 
MATLAB Bluetooth function, 120 
replying data, 122 

IMU, 117-118 

instrument control toolbox, 115 

physics, 118-119 

sources of hardware, 154 


Q 


Quadratic error, 11 

Quaternion display 
Ballerina.obj file, 135 
dancer orientation, 138 
QuaternionVisualization.m, 136 
real time plots, 135 

Quaternion operations, 126-127 


R 

rand, 16 

randi, 57, 88 

Real-time plotting, 131-134 

Recurrent Neural Network (RNN), 18-19 

Recursive Deep Learning, 19 

regressionLayer, 213 

Replaced recursive neural nets (RNNs), 
19, 210 

reshape, 17 

Root-mean-square error (RMSE), 214, 241 


S 
sequenceInputLayer (inputSize), 213 
Single-layer networks, 1, 2 
Speech recognition, 20 
Stacked autoencoders, 19 
Stock prediction algorithm 
generation 
function PlotStock.m plots, 205 
Geometric Brownian Motion, 203 
high volatility, 207 
multiple stocks, creation, 204 
US stocks, 204 
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Stock prediction algorithm (cont.) simulation 
zero volatility, 206 camera view and trajectory, 
stock market, creation, 208, 209 199 
training and testing subplot, 197-198 
bilstm layer, 216 terrain segments and aircraft path, 
LSTM layer, 210, 216, 217 200, 201 
neural net, layers, 213 test image, creation, 190-192 
predictAndUpdateState, 214, 215 training and testing, 193-196 
RMSE, 214 Testing and training 
RNNs, 210 DetectionFilterNN.m, 88-89 
stock price, 211, 212 faults, characterize, 87 
training window, 214 feedforwardnet, 88 
Support Vector Machines (SVM), 3 GUI, 89, 90 


XOR problem, 87 


T, U, y, W, X, Y Tokamaks disruptions 
Targeting, 20 dynamical model, 99-102 
Temporal convolutional machines factors, 91-93 
(TCMs), 19 numerical model 
TensorFlow, 3 controller, 98-99 
Terrain-Based navigation disturbances, 96-97 
aircraft model dynamics, 93-95 
dynamical model, 172 sensors, 96 
fminsearch, 173 plasma 
Gulfstream, 174 l control, 104, 106-107 
lift, drag, and gravity, 171, 172 simulation, 102-105, 108 
North-East-Up coordinates, train and test. 107-113 


velocity, 169, 170 
numerical integration, 175 


trainNetwork function, 56 


output, 176 Z 
trajectory, 177 Zermelo’s problem 
camera model, building control angle, 40 
Pinhole camera, 184, 185 cost, 41 
source image and view, 186, 187 costate equations, 40 
close up terrain, 182-183 defined, 38 
generating terrain model, 177-181 Hamiltonian, 39 
Plot Trajectory, over image, local and global minimums, 39 
187-189 solutions, 41 


252 


